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Unfortunate oversight 


Scientists must remember that however irrelevant their involvement in industry might seem 
to them, others will see it differently — only full disclosure will avert the taint of scandal. 


ized the natural-gas industry, has been surrounded by contro- 

versy in recent years. So, when environmental experts at the 
University of Texas at Austin produced a report in February that gave 
the technique a fairly clean bill of health, they received widespread 
news coverage, including in the pages of Nature (see Nature 482, 445; 
2012). The study was billed as an independent analysis. Yet last week it 
emerged that its lead author is a well-paid board member of an energy 
company that is actively involved in fracking. 

The failure to declare this involvement was an unfortunate mistake 
to make, not least because the man who made it is a respected senior 
scientist who headed the US Geological Survey under US presidents 
Bill Clinton and George W. Bush — and is therefore experienced 
enough to understand the role that politics and perception have in 
sensitive issues such as energy development. Yet Charles ‘Chip’ Groat, 
associate director of the University of Texas at Austin Energy Institute, 
failed to disclose that he holds a significant number of shares in the 
Houston-based Plains Exploration & Production Company, and that 
he earned more than US$400,000 from the company last year. In a 
23 July statement to Bloomberg news, he said that disclosing his posi- 
tion on the board “would not have served any meaningful purpose 
relevant to this study”. 

Groat says that his position on the board did not affect the outcome 
of the study and that he did not interfere with the findings of his col- 
leagues. The study found no evidence of groundwater contamination 
from fracking, which pumps fluid into the ground at high pressure 


H ydraulic fracturing, or ‘fracking, a technology that revolution- 


to fracture geological formations and release natural gas or oil. The 
technology has been in use for decades, and practised properly, the 
report suggested, it is safe and poses little risk to the environment. 

This over-arching conclusion seems reasonable in view of what we 
know today, although scientists continue to sift through contradictory 
evidence. And Groat’s explanation of his role also sounds plausible 
— but that is all the more reason for him to have openly disclosed his 
ties to the industry. 

After the link was revealed by the Public Accountability Initiative, 
a non-profit watchdog in Buffalo, New York, university officials 
announced plans to review the study. But even if the review exonerates 
the panel and endorses its findings, it is unlikely to remove the taint of 
scandal. Rather than cutting through the confusion on fracking, the 
report is likely to contribute to it. 

Experts in many fields bounce between academia, government and 
industry during their careers. Universities could not exclude people 
who have industry connections from their ranks, nor would they want 
to. The same goes for government. There is also nothing inherently 
wrong with universities accepting donations from industry to conduct 
studies, as long as the proper protections are put in place. The key is 
transparency, because that is the basis for trust between institutions 
and the wider public, which is especially important when people are 
buffeted by confusing, contradictory and inflammatory information. 
What the public needs, and what scientists must deliver, is reliable 
information that is honest about both its methods and its inevitable 
biases. What it needs is full disclosure. m 


Marching orders 


Scientists unhappy with policy are right to take 
to the streets. 


Last month, about 2,000 researchers marched on Parliament 
Hill in Ottawa, carrying a coffin that signified, they said, the 
“death of evidence”. The scientists were protesting against a series 
of cuts by Canadian Prime Minister Stephen Harper's conservative 
government that they believed threatened basic research and under- 
mined expert advice in areas such as environmental policy. And in 
May, physical scientists drove a horse-drawn Victorian hearse to the 
British Prime Minister's residence in Downing Street, London, this 
time to mark the demise of UK science. 
The Downing Street stunt was to protest against moves made by the 


r Vhe mock funeral — an idea so good that scientists had it twice. 


main public funder of UK physical-sciences research, the Engineering 
and Physical Sciences Research Council (EPSRC), to cut the num- 
ber of proposals it receives and to prioritize research that addresses 
national priorities or comes with economic spin-offs (see page 20). 

Echoing their Canadian counterparts, the scientists argued that the 
changes would endanger blue-skies research in chemistry, physics and 
mathematics. But unlike Canada’s protests, the UK campaign has yet 
to win support from the wider scientific community. 

In part, that is because the campaign targets a single, specific 
funder and so is not seen as relevant to UK science as a whole. Some 
researchers have dismissed the coffin parade as an overreaction toa 
spat between a few disenfranchised scientists and the EPSRC. Others 
worry that a public protest that exposes disunity in the ranks of science 
at a time of economic chaos could result in cuts to the science budget. 

Perhaps, but if it is an isolated spat, then why did people with little 
personal stake in the EPSRC’s policies join in the protests? And the 
calls by dissenters to close ranks — to keep calm and to carry on — 
ignore the fact that science funding is a political question. To make a 
point in a political arena, scientists must stand up and be counted. = 
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WORLD VIEW .cnsicornon 


is that every story comes with a feeling of déja vu. You keep 

thinking: I’ve been here before. So it is refreshing to report 
one issue where something has actually changed: the vexed and 
perennial problem of research misconduct, which scientific leaders 
are finally taking seriously. Talking to several leaders in recent weeks, 
Ihave found that their mood has hardened — and not before time. 

For too long, scientists’ instinctive defensiveness has produced 
general denial that misconduct constitutes a serious problem. 

I arrived in Washington DC to work for Nature in 1993, in the 
aftermath of congressional hearings into allegations of miscon- 
duct involving a paper by biologists David Baltimore and Thereza 
Imanishi-Kari at the Massachusetts Institute of Technology in Cam- 
bridge. The researchers were correctly found 
innocent. But the case led an independent 
commission chaired by reproductive biologist 
Kenneth Ryan to call for a much more rigorous 
approach to the investigation of misconduct. 

Ryan was shot down in flames by scientific 
officials and his recommendations were ignored. 
They were delivered to the US Department of 
Health and Human Services, which kicked them 
upstairs to the White House. The administration 
of then-president Bill Clinton sat on the findings 
until 2000, when it issued a bland federal miscon- 
duct decree. And that was in the United States — 
the world’s dominant scientific power and the one 
that had done the most to address misconduct. 

Countermeasures elsewhere have been even 
feebler. In Germany, for example, no university 
had an integrity officer until 2011, and it is still 
difficult for institutions there to sanction proven fraudsters. Some 
judges consider academic freedom of expression to be paramount — 
and say that it would be violated if a university were to request scien- 
tists to retract a paper. 

Worldwide, however, research integrity is now very much in the 
spotlight. Prominent cases in the United Kingdom, South Korea, the 
Netherlands and Canada in recent years have each had a disturbing 
and powerful impact in their respective locales. 

Considerable hard data have emerged on the scale of misconduct. 
A metastudy (D. Fanelli PLoS ONE 4, e5738; 2009) and a detailed 
screening of all images in papers accepted by The Journal of Cell 
Biology (M. Rossner The Scientist 20 (3), 24; 2006) each suggest that 
roughly 1% of published papers are fraudulent. That would be about 
20,000 papers worldwide each year. 


() ne problem with having worked as a journalist for a long time 


At the time of the Baltimore case, it was widely NATURE.COM 
argued that research misconduct was insignifi- _ Discuss this article 
cantly rare — and irrelevant to the progress of _ online at: 
science, which would self-correct. Few senior _go.lature.com/Gknitx 


CURRENT SCIENTIFIC 
LEADERS HAVE THE 


OPPORTUNITY 


TO TAKE THE 
INITIATIVE AND 


STAMP DOWN 
ON FRAUD. 


~ © The timeis right to 
=8 confront misconduct 


After a generation of denial, research leaders are finally treating scientific 
fraud with the seriousness it deserves, says Colin Macilwain. 


scientists now believe that. They know that misconduct exists and that, 
unchecked, it can undermine public regard for science and scientists. 

Two major studies to be released in the next year reflect this shift in 
attitude. Significantly, they have been instigated by leading scientists. 
One study, by the InterAcademy Council, is looking at international 
aspects of misconduct. Sharp disparities in investigative procedures 
— and the lack of any such procedures, or responsible officials, at 
many institutions outside the United States — are problematic, given 
that an increasing proportion of research involves collaborators from 
more than one country. 

Robbert Dijkgraaf, co-chairman of the InterAcademy Coun- 
cil, is one of the people leading the study. He hopes that, when its 
findings are released this year, governments and research agencies 
around the world will use them as a template 
to improve training and enforcement of good 
research conduct. 

The second study, by the US National Acad- 
emy of Sciences, will report in 2013. It is likely to 
call for far-reaching changes in how US agencies 
define and police misconduct. Since the 2000 
decree, agencies have regarded only ‘falsifica- 
tion, fabrication and plagiarism’ as misconduct: 
the academy may call for this definition to be 
widened in line with an emerging global con- 
sensus to include most other sorts of unethical 
behaviour, such as falsely attributed authorship. 

Last December, for example, Canada estab- 
lished a Tri- Agency Framework for the Respon- 
sible Conduct of Research at its main funding 
agencies. The framework oversees publicly and 
privately funded research and has a secretariat to 
support university misconduct investigations. 

Britain is also finally taking some faltering steps to address the 
issue. In July, universities adopted a voluntary concordat that obliges 
them to investigate misconduct allegations. Some research leaders 
want to leave it at that but others, led by Michael Rawlins, chairman of 
the UK National Institute for Health and Clinical Excellence, demand 
further action to ensure that cases are properly investigated. 

Current scientific leaders have the opportunity to take the initiative 
and stamp down hard on fraud. Next year’s National Academy study 
wont use language as divisive as Ryan’s, but it could usher in a more 
consistent US system to handle misconduct, which could percolate 
around the globe. The international report will help governments and 
agencies to pursue miscreants across borders. Together, the studies 
represent a historic opportunity to deal with what is, perhaps, the 
single most potent threat to science’ prestige. = 


Colin Macilwain writes about science policy from Edinburgh, UK. 
e-mail: cfmworldview@gmail.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Blind mice can 
sense light 


A small molecule restores light 
sensitivity to blind mice when 
it is injected into their eyes. 

Richard Kramer at the 
University of California, 
Berkeley, and his colleagues 
studied a mouse model of 
retinitis pigmentosa — a 
form of blindness in which 
light-sensing rod and cone 
cells in the retina degenerate. 
The researchers applied the 
molecule, AAQ, to retinas 
isolated from the mice and 
found that it triggered retinal 
ganglion cells — most of which 
are normally light-insensitive 
— to increase their firing rate 
in response to light. Other 
work has suggested that the 
molecule functions by blocking 
potassium ion channels in 
the membranes of neurons, 
boosting their excitability. In 
behavioural tests, blind mice 
treated with AAQ showed signs 
of light sensitivity. 

The use of this and related 
molecules could restore 
vision less invasively than 
other proposed methods, the 
researchers say. 
Neuron 75, 271-282 (2012) 


Skin bacteria 
boost immunity 


Microbes living in mammalian 
guts have an important role 
in intestinal immunity and it 
seems that those living on the 
skin are similarly crucial for 
tuning immune responses to 
skin pathogens. 

Yasmine Belkaid at the US 
National Institute of Allergy 
and Infectious Diseases in 
Bethesda, Maryland, and her 
colleagues compared mice 
with microbes on their skin 
with germ-free mice raised 
in aseptic conditions. T cells, 


Hunter-gatherer genes 


Three African populations that rely mainly on hunting and 
gathering possess a trove of previously unrecorded genetic 


diversity. 


Sarah Tishkoff at the University of Pennsylvania in 
Philadelphia and her team sequenced the full genomes of five 
individuals from each of three populations: Cameroonian 
Pygmies, and the Hadza (pictured) and Sandawe people from 
Tanzania. The researchers’ trawl uncovered 13.4 million 
variants — more than 3 million of which have never been 


seen before. 


Genes involved in immunity, metabolism, taste, 
smell and reproduction seem to have evolved since the 
different populations split — a sign of adaptation to local 
environments. In the Pygmies, recent changes in genes 
involved in the function of the pituitary gland, which secretes 
growth and other hormones, could explain their short stature. 
All the hunter-gatherers sampled showed signatures of gene 
flow from now-extinct human species. This has been seen 
before mainly in non-African populations, supporting the 
idea that breeding between various human species occurred 


regularly. 


Cell http://dx.doi.org/10.1016/j.cell.2012.07.009 (2012) 
Fora longer story on this research, see go.nature.com/ss7rzr 


a subset of immune cells, 
produced fewer immune- 
stimulating molecules in 
germ-free animals than in 
control mice. When infected 
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with a skin parasite, the germ- 
free mice developed a greater 
number of parasites per skin 
lesion than the controls, and 
also showed impaired T-cell 
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responses. Populating the skin 
with askin bacterium restored 
immunity to the germ-free 
animals. 

Science http://dx.doi. 
org/10.1126/science.1225152 
(2012) 

For a longer story on this research, 
see go.nature.com/8ahyc3 


Rechargeable 
Li-air battery 


Lithium air batteries 
promise to greatly exceed 
the energy-storage capacity 
of conventional lithium-ion 
batteries and a study shows 
that they can retain 95% of 
their capacity even after 
100 recharges. 

Current is generated in 
lithium-air batteries when 
lithium ions from the anode 
react with oxygen from the 
air — rather than witha 
limited volume of oxidizing 
agent, as in conventional 
batteries. Peter Bruce 
and his colleagues at the 
University of St Andrews, 
UK, created a lithium-air 
battery using an electrolyte of 
dimethylsulphoxide, through 
which the lithium ions flow, 
and a porous gold cathode 
where oxygen is reduced 
before it reacts with the 
lithium ions. These materials 
seem to prevent the side- 
reactions that have quickly 
degraded the performance of 
previous lithium -air batteries. 
Science http://dx.doi. 
org/10.1126/science.1223985 
(2012) 


ANIMAL BEHAVIOUR 


Sex is costly 
for squid 


For a squid, mating can take up 
to three hours, and the resulting 
energy losses could put the 
animal at a disadvantage 
around predators and reduce 


N. PAVITT/JAI/CORBIS 
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“we 


foraging opportunities. 

Amanda Franklin and her 
colleagues at the University 
of Melbourne in Victoria, 
Australia, collected wild 
dumpling squid (Euprymna 
tasmanica; mating male and 
female pictured). They tested 
the creatures’ swimming 
endurance in a tank witha 
constant current, before and 
after the squid mated. Mating 
halved the time taken for 
males and females to become 
exhausted, but both regained 
their energy within 30 minutes 
of copulation. 

Knowing this cost could 
contribute to a better 
understanding of the evolution 
of reproductive behaviours, 
such as promiscuity, in squid, 
the researchers say. 

Biol. Lett. http://dx.doi. 
org/10.1098/rsbl.2012.0556 
(2012) 


Light control in 
monkey brains 


Using a technique that 
makes it possible to control 
the activity of specific 
engineered neurons with light, 
neuroscientists have modified 
behaviour in primates. 
Optogenetics can help 
researchers to figure out the 
role of individual neurons, but 
it has previously been used 
to control behaviour only in 
rodents and invertebrates. 
Wim Vanduffel at Harvard 
Medical School in Boston, 
Massachusetts, and his 
colleagues genetically modified 
neurons in the premotor and 
prefrontal cortex brain regions 


of two rhesus monkeys, so 
that the neurons fired when 
blue light was delivered into 
the brain by an optical cable. 
Stimulating these neurons as 
the monkeys performed an 
eye-movement task changed. 
the latencies of eye movements 
in both animals. Functional 
magnetic resonance imaging 
revealed that the stimulation 
induced distinct patterns of 
brain activity during the task. 
Curr. Biol. http://dx.doi. 
org/10.1016/j.cub.2012.07.023 
(2012) 


Resilient to 
natural disasters 


A study of ancient 
volcanic ash found at 
key archaeological 
sites suggests that 
Neanderthals (pictured) 
and early modern P 
humans were more 
resilient to climate 
change and natural 
disasters than is 
often assumed. 
John Lowe 
at the Royal 
Holloway 
University of 
London in 
Egham, UK, and 
his colleagues 
analysed 
microscopic shards 
of volcanic ash from 
a major eruption that 
occurred in Europe 
some 40,000 years ago. 
The volcano spewed so 
much climate-cooling 
ash that the event 
probably created 


+ “Neanderthals from parts of 
«= ea Europe before the eruption 
¥ uy and subsequent cooling, 


RESEARCH HIGHLIGHTS Mii Saiaae 


COMMUNITY 


The most viewed 
papers in science 


Aerosols keep down monsoon rain 


Tiny airborne particles called 


CHOICE 


poten = atmospheric aerosols tend to reduce 
the week of 23 July Summer monsoon rainfall over most of 


South Asia. 

Dilip Ganguly and his colleagues at the Pacific Northwest 
National Laboratory in Richland, Washington, used a 
simplified atmosphere-ocean model to simulate the effects of 
changes in the levels and composition of atmospheric aerosols 
— from local and distant sources — on South Asia’s mean 
monsoon rainfall. Increased local emissions of aerosols such 
as black carbon — which absorbs sunlight and produces a 
warming effect that tends to reduce cloud cover — weakened 
the monsoon rains in most of South Asia. Aerosols from 
outside Asia also contributed to the overall reduction in 
rainfall. 

Only over northwest India, where aerosol emissions from 
local forest and grass fires are thought to be decreasing, did 
the mean summer monsoon rainfall increase. 

J. Geophys. Res. http://dx.doi.org/10.1029/2012JD017508 (2012) 


winter-like conditions. Because 
the researchers found the ash 

at several archaeological sites 
in Europe and North Africa, 
they were able to link events 

in Neanderthal and human 
evolution with the timing of 
climatic changes. Early modern 
humans started to displace 


GENE THERAPY 


Gene fix repairs 
hearing 


Gene therapy has restored 
hearing for up to 18 months in 
mice that were born deaf. 

The animals are missing the 
gene that encodes the protein 
VGLUT3. Lack of VGLUT3 
renders inner hair cells of 
the ear’s cochlea incapable of 
sending electrical signals to the 
brain. Lawrence Lustig at the 
University of California, San 


and their activities appear 
to have been unaffected 
by these events. 
Indeed, in parts of 


central and Francisco, and his team used a 
eastern Europe, virus to deliver the Vglut3 gene 
Neanderthals into the cochleas of these mice. 


After one week, the researchers 
detected auditory responses 

in the creatures brains, and 
within two weeks, the animals 


seem to have 
become extinct 
well before 
|. the eruption 


occurred. showed an increased startle 
Early response to sound. 
modern humans The results could bode well 
probably placed for humans, the researchers 
greater pressure suggest, because VGLUT3 
on Neanderthals is also associated with a rare 
than did volcanic form of human deafness. 
eruptions or Neuron 75, 283-293 (2012) 


climate change, the 


researchers suggest. > NATURE.COM 
Proc. Natl Acad. Sci. USA For the latest research published by 
http://dx.doi.org/10.1073/ Nature visit: 


www.nature,com/latestresearch 
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SEVEN DAYS nescnni 


Stem-cell ruling 

The US Food and Drug 
Administration (FDA) has 
now been given legal backing 
for its attempts to regulate a 
US clinic that offers therapies 
involving a patient’s processed 
stem cells. Such treatments can 
now be classified as drugs, a US 
District Court in Washington 
DC ruled on 23 July, in relation 
to the FDAs injunction against 
Regenerative Sciences, a 
stem-cell clinic in Broomfield, 
Colorado. The ruling could 
pave the way for the agency to 
regulate other stem-cell clinics. 
See page 14 for more. 


Data exemption 

A cross-party group of 
politicians has recommended 
that England’s laws on 

freedom of information be 
modified to protect universities 
from having to release data 
prematurely. The nation’s 
universities have complained 
that the Freedom of 
Information Act might be used 
to force the release of research 
findings and data before they 
are ready for publication. On 26 
July, Parliament's Justice Select 
Committee agreed, saying that 
the existing ‘pre-publication 
exemption section of the 

act should be amended. See 
go.nature.com/cfl fia for more. 


Lab-death charges 


A landmark criminal 
prosecution over an accident 
in a US academic laboratory 
reached a partial conclusion 
on 27 July. Ina deal that saw 
criminal charges dropped, 
the regents of the University 
of California accepted 
responsibility for laboratory 
conditions three-and-a-half 
years ago, when 23-year-old 
Sheharbano Sangji died ina 
lab fire at the University of 
California, Los Angeles. The 
regents also agreed to put in 
place stringent safety measures 


India curbs tiger tourism 


India’s Supreme Court has placed an interim ban 
on tourists visiting central parts of the country’s 
40 or so tiger reserves, to protect the dwindling 
population of the endangered big cats. The 

24 July ruling — which the court will re-examine 
on 22 August — still allows tourists into fringe 
areas of reserves (‘buffer zones’). Park managers 
said that the ruling would devastate tourism 


and to set up a US$500,000 
environmental-law scholarship 
in Sangji’s name. But charges 
remain against Sangji’s 
supervisor, the organic 
chemist Patrick Harran; his 
case has been postponed until 
5 September. See go.nature. 
com/hmoden for more. 


Industry ties 

Doubt has been cast ona 
supposedly independent study 
into the risks of fracking (the 
pumping of high-pressure 
fluids into shale to force out 
natural gas) after its lead author 
confirmed last week that he is 
on the board of directors for 

an energy company actively 
involved in the practice, a 
position that earned him more 
than US$400,000 last year. 
Charles Groat, of the University 
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of Texas at Austin, did not 
disclose his industry ties when 
the report (see go.nature. 
com/sopiwm) was released in 
February. The university says 
that it is reviewing the study. 
See page 5 for more. 


Ebola outbreak 


The first widespread outbreak 
of Ebola haemorrhagic fever 
since 2009 has killed 14 people 
in the Kibaale district of 
western Uganda, the World 
Health Organization said on 
29 July. Twenty cases have been 
reported since the beginning 
of July, but the presence of 
ebolavirus was not officially 
confirmed until last week. After 
the virus spread to the capital, 
Kampala, Ugandan President 
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in some national reserves, but that other parks 
would hardly be affected, because they already 
keep core areas off-limits. The order came in 
response to a petition from conservationist Ajay 
Dubey at the non-governmental organization 
Prayatna in Bhopal. According to a 2010 census, 
India is home to about 1,700 wild tigers — more 
than half of the world’s total. 


Yoweri Museveni told people 
to avoid physical contact. 
According to the Uganda Virus 
Research Institute in Entebbe, 
the outbreak involves the 
Sudan subtype of the virus, 
which in a 2000-01 Ugandan 
outbreak killed 224 people — 
53% of identified cases. 


Greenland melt 


Satellite observations revealed 
massive surface melting across 
the Greenland ice sheet last 
month as a dome of unusually 
hot air settled over the region, 
NASA scientists announced on 
24 July. Between 8 and 12 July, 
the area subject to melting 
increased from 40% to 97% 

of the ice sheet — an extent 
unprecedented in three decades 
of space observations. The 
previous record was 55%. But 
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y the event falls within the realm 

& of natural variability: ice-core 

% records suggest that extreme 

3 melting occurs roughly once 
every 150 years, with the most 


SOURCE: THOMSON REUTERS/ WWW.PWCMONEYTREE.COM 


recent event in 1889. 


Warming redux 

The Berkeley Earth Surface 
Temperature (BEST) study 
released the second part 

of its independent analysis 

of the global land-surface- 
temperature record on 

29 July. The findings — that 
the planet has warmed over 
the past 250 years owing to 
human influence — are not 
news to climate scientists. But 
team leader Richard Muller, a 
physicist at the University of 
California, Berkeley, is being 
criticized by researchers for 
publicizing the BEST results 
before they have been peer 
reviewed. The BEST team has 
not yet published any of its 
findings in journals, despite 
posting its first results online 
last October. See go.nature. 
com/euvydr for more. 


PF UNDING 
Antarctica upgrade 


US research facilities in 
Antarctica, such as the Polar 
Star icebreaker (pictured), 
need an overhaul, says a report 
to the US National Science 
Foundation, released on 

23 July. To pay for the upgrade, 


the authors recommend that 


the roughly US$300-million 
budget for the US Antarctic 
Program (USAP) be increased 
by 6%, and that the programme 
divert 6% of its planned science 
spending to infrastructure. The 
USAP devotes nine times more 
person days in Antarctica to 
logistics efforts than it does to 
actual research, and after the 
upgrade, the balance should 
tilt more towards research, the 
authors add. See go.nature. 
com/dvb9y9 for more. 


Nuclear safety 


Japan's new nuclear regulatory 
commission will probably be 
headed by radiation physicist 
Shunichi Tanaka. On 26 July, 

a parliamentary committee 
(covering both lower and upper 
houses) proposed Tanaka as 
head of the commission, which 
will be launched in September 
and will be affiliated with the 
environment ministry. But 
some observers objected to 
Tanaka, a former deputy 

chair of the cabinet'’s Japan 


Atomic Energy Commission, 
accusing him of being too 
close to the nuclear industry 
and playing down the health 
risks from last year’s disaster at 
the Fukushima Daiichi 
nuclear plant. 


The Smithsonian Institution’s 
National Museum of Natural 
History in Washington DC has 
announced its next director: 
Kirk Johnson, a geologist 

who specializes in plant 

fossils from the Cretaceous 
period. Johnson currently 
serves as vice-president of 
research and collections and 

as chief curator at the Denver 
Museum of Nature & Science 
in Colorado. He will take on 
the Smithsonian museum's 
US$68-million budget and 126 
million artefacts and specimens 
on 29 October. Johnson 
replaces Cristian Samper, who 
stepped down on 23 January to 
head the Wildlife Conservation 
Society in New York. 


Misconduct charge 
A once high-flying Danish 
neuroscientist, Milena 
Penkowa, is suspected of 
“potentially intentional 
misconduct” involving 

15 research papers, according 
to aleaked report from an 
international committee 
investigating her case. The 
report — published on 25 July 
by Danish newspaper BT but 
due to be released officially 


BIOTECH VENTURE-CAPITAL DIPS 


TREND WATCH 


US venture capitalists seem to 
be avoiding the biotechnology 
sector in favour of information 
technology, according to 
numbers released on 20 July in 
the US National Venture Capital 
Association's ‘MoneyTree’ 
report. Investments in biotech 
firms dropped to a 9-year low 
of around US$700 million in 
the second quarter of 2012. And 
they accounted for fewer than 
10% of all venture-capital deals 
in that quarter — a much lower 
proportion than usual (see 
chart). 


Investment in biotechnology firms by US venture capitalists 
dropped sharply in the first half of 2012. 


US biotech venture-capital 
funding ($US billion) == 
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Biotech share of total US 
venture capital (%) === 


1 i) 
2009 2011 


SEVEN DAYS | THIS WEEK | 


6 AUGUST 

NASA tries to land its 
Curiosity rover on Mars. 
See page 16 for more. 
mars,jpl.nasa.gov/ms! 


5-10 AUGUST 

The Ecological Society 
of America meets in 
Portland, Oregon, to 
discuss preserving, 
utilizing and sustaining 
Earth’s ecosystems. 
www.esa.org/portland 


in August — had been 
requested by the University 
of Copenhagen, Penkowa’s 
former employer, in February 
2011. Penkowa had already 
resigned and been sentenced. 
for embezzling money 

from the Danish Society for 
Neuroscience. Two of her 
papers have been officially 
retracted, and a report from 
the Danish Committee on 
Scientific Dishonesty is also 
expected later in summer. See 
go.nature.com/eakrbd for 
more. 


Physics millions 


A lucrative prize for 
fundamental physics was 
launched on 31 July, with nine 
researchers each receiving 
US$3 million. The prize is 
sponsored by Yuri Milner, 

a Russian billionaire who 
once studied for a physics 
PhD. Milner chose the first 
winners: Alan Guth, Andrei 
Linde, Nima Arkani-Hamed, 
Juan Maldacena, Nathan 
Seiberg, Edward Witten, 
Alexei Kitaev, Maxim 
Kontsevich and Ashoke Sen, 
who will, in turn, select future 
winners. The prize will be 
announced annually and is 
accompanied by an ad hoc 
award for ‘exceptional cases’ 
and a $100,000 prize for 
promising junior researchers. 
See go.nature.com/mwaays for 
more. 


> NATURE.COM 
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Cancer stem cells tracked 


The master builders that underlie tumour growth may inform treatment strategies. 


BY MONYA BAKER 


( ancer researchers can sequence tumour 
cells’ genomes, scan them for strange 
gene activity, profile their contents for 

telltale proteins and study their growth in labo- 
ratory dishes. What they have not been able 
to do is track errant cells doing what is more 
relevant to patients: forming tumours. Now 
three groups studying tumours in mice have 
done exactly that’*. Their results support the 
ideas that a small subset of cells drives tumour 
growth and that curing cancer may require 
those cells to be eliminated. 

It is too soon to know whether these results 
— obtained for tumours of the brain, the gut 
and the skin — will apply to other cancers, says 
Luis Parada at the University of Texas South- 
western Medical Center in Dallas, who led the 
brain study’. But if they do, he says, “there is 


going to be a paradigm shift in the way that 
chemotherapy efficacy is evaluated and how 
therapeutics are developed”. Instead of test- 
ing whether a therapy shrinks a tumour, for 
instance, researchers would assess whether it 
kills the right sorts of cell. 

Underlying this scenario is the compelling 
but controversial hypothesis that many tumours 
are fuelled by ‘cancer stem cells’ that produce the 
other types of cancer cell, just as ordinary stem 
cells produce normal tissues. Previous studies 
have tested this idea by sorting cells from a can- 
cer biopsy into subsets on the basis of factors 
such as cell-surface markers, and injecting them 
into laboratory mice. In 
principle, those cells that 


generate new tumours _ Foraweb focus on 
are the cancer stem cancer metabolism, 
cells. But sceptics point see: 


out that transplantation 
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removes cells from their natural environment 
and may change their behaviour. “You can see 
what a cell can do, but not what cells actually 
do,’ says Cédric Blanpain of the Free University 
of Brussels, who co-led the skin study’. 

All three research groups tried to address this 
knowledge gap by using genetic techniques to 
track cells. Parada and his co-workers began 
by testing whether a genetic marker that 
labels healthy adult neural stem cells but not 
their more specialized descendents might 
also label cancer stem cells in glioblastoma, a 
type of brain cancer. When they did so, they 
found that all tumours contained at least a 
few labelled cells — presumably stem cells. 
Tumours also contained many unlabelled 
cells’. The unlabelled cells could be killed 
with standard chemotherapy, but the tumours 
quickly returned. Further experiments showed 
that the unlabelled cells originated from 
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> labelled predecessors. When chemotherapy 
was paired with a genetic trick to suppress the 
labelled cells, Parada says, the tumours shrank 
back into “residual vestiges” that did not resem- 
ble glioblastoma. 

Meanwhile, Hans Clevers, a stem-cell biolo- 
gist at the Hubrecht Institute in Utrecht, the 
Netherlands, and his colleagues focused on the 
gut. They had previously shown that a genetic 
marker that labels healthy gut stem cells also 
labels stem cells in benign intestinal tumours, 
which are precursors of cancer’. In their lat- 
est study’, he and his team engineered mice to 
carry a gene for a drug-inducible marker that, 
when activated, causes labelled cells to make 
molecules that fluoresce one of four colours. 
This experiment yielded single-colour tumours 
consisting of several cell types, suggesting that 
each tumour arose from a single stem cell. To 
check that stem cells continued to fuel the 
tumours, Clevers added a second, low dose of 
the drug, triggering a few of the stem cells to 
change colour. This produced streams of cells 
in the new colour, showing that stem cells were 
consistently producing the other cell types. 

For the skin study, Blanpain and his group 
labelled individual tumour cells, without target- 
ing stem cells specifically’. They found that cells 
showed two distinct patterns of division: they 
either produced a handful of cells before peter- 
ing out, or went on to produce many cells. Once 
again, the results pointed to a distinct subset of 
cells as the engine of tumour growth. What's 
more, as tumours became more aggressive, they 
were more likely to produce new stem cells — 
which can divide indefinitely — and less likely 
to produce differentiated cells, which can divide 
only a limited number of times. That could bea 
key to halting tumour development early, says 
Blanpain. Rather than eradicating cancer stem 
cells, for example, therapies could try to coax 
them to differentiate into non-dividing cells. 

The papers provide clear experimental 
evidence that cancer stem cells exist, says 
Robert Weinberg, a cancer researcher at the 
Whitehead Institute in Cambridge, Massachu- 
setts. “They have made a major contribution to 
validating the concept of cancer stem cells,” he 
says. But cancer cells probably also act in more 
complex ways than those observed, he warns. 
For example, non-stem cells within the tumour 
might de-differentiate into stem cells. 

The next step, the three groups say, is figur- 
ing out how the cells tracked in these experi- 
ments relate to putative cancer stem cells 
identified by years of transplantation studies. 
Researchers are already busy hunting for ways 
to kill these cells; now they have more tools to 
tell whether such a strategy will work. m 


1. Driessens, G., Beck, B., Caauwe, A., Simons, B. D. 
& Blanpain, C. Nature http://dx.doi.org/10.1038/ 
naturel 1344 (2012). 

2. Chen, J. et al. Nature http://dx.doi.org/10.1038/ 
naturel 1287 (2012). 

3. Schepers, A. G. Science http://dx.doi.org/10.1126/ 
science.1224676 (2012). 

4. Barker, N. et al. Nature 457, 608-611 (2009). 
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THERAPEUTICS 


FDA’s claims over 
stem cells upheld 


Drug watchdog wins right to regulate controversial 


therapies. 


BY DAVID CYRANOSKI 


court decision on 23 July could help 
Ae tame the largely unregulated field 

of adult stem-cell treatments. The US 
District Court in Washington DC affirmed 
the right of the Food and Drug Administra- 
tion (FDA) to regulate therapies made from 
a patient’s own processed stem cells. The case 
hinged on whether the court agreed with the 
FDA that such stem cells are drugs. 

The judge concurred, upholding an injunc- 
tion brought by the FDA against Regenera- 
tive Sciences, based in Broomfield, Colorado. 
Under the treatment sold by the firm, stem 
cells are isolated from patients’ bone marrow, 
processed, and the resulting cells injected 
back into the patients 


totreatjointpain. The «yqintainin g 

FDA calls this pro- the FDA’s role 

—— the “manu- as watchdog 
acturing, holding for 

sale, a eaasoa and regulatory 

ofan unapproved bio- authority a 3 

logical drug product’, imperative. 

and in August 2010, 


ordered Regenerative Sciences to stop offer- 
ing the treatment (see Nature 466, 909; 2010). 

During investigations leading up to the 
injunction, the FDA also found that, because 
of flaws in its cell processing, the company 
was violating regulations on “adulteration” 
that are meant to ensure patients’ safety. 

Jeanne Loring, a regenerative-medicine 
scientist at the Scripps Research Institute in 
La Jolla, California, says that the decision 
will send a warning to other entrepreneurs 
offering unapproved stem-cell treatments. 
“So many people want to start these compa- 
nies. They say, ‘FDA? What FDA?” 

Chris Centeno, the medical director of 
Regenerative Sciences and one of two major- 
ity shareholders, told Nature that he plans to 
appeal against the ruling. During the case, 
the company claimed that the cells in its 
‘Regenexx’ procedure are not significantly 
modified before they are reinjected, so the 
procedure should be considered routine 
medical practice. The company also argued 
that because all the processing work is done in 
Colorado, the procedure should be subject to 
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state law, rather than to regulation by the FDA. 

The court disagreed on both counts, 
noting that “the biological characteristics of 
the cells change during the process’, and that 
this, together with other factors, means the 
cells are more than “minimally manipulated”. 

Leigh Turner, a bioethicist at the Univer- 
sity of Minnesota in Minneapolis, agrees. “It 
is much too simplistic to think that stem cells 
are removed from the body and then returned 
to the body without a ‘manufacturing process’ 
that includes risk of transmission of com- 
municable diseases,’ he says. “Maintaining 
the FDA’s role as watchdog and regulatory 
authority is imperative” 

Centeno says that the FDA injunction 
applies to only one of his company’s four 
stem-cell products — one that requires 
4-6 weeks of processing. The procedure will 
still be available: after the 2010 injunction, the 
company moved its treatment location to an 
affiliated Cayman Island clinic. 

Centeno plans to continue providing the 
other three procedures, also used for joint 
pain, in the United States. In those treatments, 
the cells are reinjected within two days. Cen- 
teno claims that those cells are “minimally 
manipulated”, and that the FDA sees them as 
the “practice of medicine” and “has no issues” 
with them. Indeed, until 25 July, a graphic on 
the Regenerative Sciences website claimed 
that these three procedures were “FDA 
approved”. 

In fact, the FDA has not approved these 
procedures, and Centeno did not provide 
documentation to support his claims that the 
agency views the three treatments as outside 
its purview. The graphic was removed after 
Nature’s enquiries. 

Doug Sipp, a stem-cell ethics and regula- 
tion expert at the RIKEN Centre for Devel- 
opmental Biology in Kobe, Japan, worries 
that more stem-cell companies might now 
set up shop outside the United States to 
avoid regulation, as Regenerative Sciences 
has done. “Other US stem-cell outfits have 
close ties with partner clinics in Mexico and 
other neighbouring countries, which are tra- 
ditionally regulatory havens for other forms 
of fringe medicine as well. I suppose it will be 
business as usual in such places,’ Sipp says. = 
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Mouth lesions are a sign of rinderpest, which has long decimated cattle throughout the world. 


INFECTIOUS DISEASE 


Officials act to secure 
cattle-plague virus 


Risk of accidental reintroduction shadows rinderpest 


eradication effort. 


BY DECLAN BUTLER 


inderpest, a devastating cattle disease, 
R= not been seen in the wild for a 
decade, but it lives on in scores of labs. 
Twelve months after the world celebrated the 
success of a years-long vaccination campaign 
that made rinderpest only the second disease 
after smallpox to be eradicated, animal-health 
authorities are turning to the next task: making 
sure that a lab release — accidental or inten- 
tional — doesn’t lead to a resurgence. 
Rinderpest is as deadly to cattle as highly 
pathogenic H5N1 avian flu is to chickens. 
In past decades, outbreaks ripped through 
herds and wiped out up to 90% of animals, 
often leaving famine, and sometimes war, in 
their wake. “Its eradication is a huge, huge, 
achievement that has happened largely under 
the radar of most of the virology and scientific 
community,” says David Ulaeto, a member ofa 
seven-person multidisciplinary Joint Advisory 
Committee (JAC) on rinderpest that was set up 
to consolidate the eradication by the Rome- 
based Food and Agriculture Organization of 
the United Nations (FAO) and the Paris-based 


World Organisation for Animal Health (OIE). 

This October, the JAC will probably issue 
the first of a series of guidelines for an inter- 
national oversight system. With the help of 
ad hoc expert groups, the JAC would approve 
official repositories of the virus and ensure that 
they meet tough biosafety standards. The com- 
mittee would also approve all future research 
on live rinderpest virus to ensure that its 
benefits outweigh the risks. 

The FAO and OIE dortt have the authority 
to impose such measures on member states, 
but last year, countries gave the organiza- 
tions a mandate by endorsing a moratorium 
on research and declaring that the remaining 
virus samples should be destroyed or shipped 
to approved high-security labs. The approach is 
modelled on the post-eradication phase of the 
smallpox campaign, which saw the number of 
labs holding the virus reduced from 76 in 1976 


to just 2 in 1984. 

To identify labs that NATURE.COM 
might still hold rinder- _ To followthe last 
pest virus, the FAO _ daysofrinderpest, 
carried out extensive lit- __ visit: 
erature searches, liaised —_go.nature.com/Ikvani 
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with ministries of agriculture and veterinary 
services worldwide, and wrote “to virtually 
everyone they could think of”, says Ulaeto. 
By last week, the FAO and OIE had identified 
some 40 labs. “They were a bit surprised at how 
many laboratories did have virus,’ he says. 

The list remains confidential, but it includes 
labs from some 20 countries, thought to be 
mainly in Africa, the Middle East and Asia, 
where rinderpest outbreaks were common 
until recently, and a handful of established rin- 
derpest research centres, such as the Institute 
for Animal Health in Pirbright, UK, and the 
Plum Island Animal Disease Center in New 
York state. One worrying aspect was that some 
virus samples were found to be held in facilities 
that had inadequate biosafety levels. 

Fears of an accidental release are grounded 
in experience. After smallpox was eradicated, 
a lab accident in Birmingham, UK, resulted 
in two infections and one death. And an acci- 
dental release of foot-and-mouth virus from 
the Pirbright facility, which houses a high- 
biosecurity, world-reference laboratory for 
both foot-and-mouth and rinderpest, caused 
an outbreak in the United Kingdom in 2007. 

Active research on rinderpest has waned as 
the disease has been brought under control 
over the past few decades, says Michael Baron, 
a rinderpest researcher at the Pirbright cen- 
tre. He and others say that the biggest threat is 
from long-forgotten samples of virus from past 
research programmes, and serum and other 
samples collected for diagnostic or other pur- 
poses, that may be lurking in lab freezers. Rin- 
derpest vaccine strains, which are stocked in 
many countries and consist of live attenuated 
virus, are also a concern. In theory, they could 
revert to wild type and cause disease outbreaks. 

Until the world is certain that rinderpest is 
gone for good, vaccine strains will probably 
need to be maintained in high-security labs 
in several regions so that they can be shipped 
swiftly to any outbreak, says Baron. But he says 
that just a couple of pure-research labs would 
be enough to pursue the valuable scientific 
opportunities that rinderpest still offers. 

Although the virus is closely similar to the 
human measles virus, for example, cattle don’t 
catch measles and humans don't catch rin- 
derpest. Understanding why this is so could 
provide insight into the pathology and basic 
biology of viruses, Baron says. Of more imme- 
diate interest, investigators would also like to 
know whether vaccines can be developed 
against another related virus, the sheep and 
goat disease called peste des petits ruminants, 
that might also protect against rinderpest. That 
would eliminate the need to keep any stocks of 
live attenuated rinderpest virus at all. 

Baron’s home lab contains more than 
100 different rinderpest virus isolates, which 
he says represent “basically the history of the 
disease”. He intends to sequence them all in the 
next few years — so that they can be recreated 
if ever needed — and then destroy them. m 
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NUTRITIONAL SUPPLEMENTS 


Lawsuit challenges 
anti-ageing claims 


Former executive sues manufacturer of pill meant to rejuvenate cells. 


BY BRENDAN BORRELL 


cc oure going to be hearing from my 

Y attorney,’ Brian Egan told his boss 

on his last day of work, nearly a 

year ago. Last week, Egan filed a class-action 

lawsuit that accuses Telomerase Activation 

Sciences (‘TA Sciences) in New York of engag- 

ing in deceptive business practices in promot- 

ing a proprietary herbal extract intended to 
reverse the effects of ageing. 

The lawsuit threatens to put the science of 
telomeres — repetitive nucleotide sequences 
that protect the ends of chromosomes during 
DNA replication — on trial. But Noel Patton, 
president of TA Sciences, denies all of the alle- 
gations. “We stand by what we say,’ he adds. 

The connection between cellular ageing and 
telomere length is rooted in solid research. 
Telomeres become shorter every time a cell 
divides, and when they are lost cells can no 
longer reproduce. The enzyme telomerase 
can lengthen telomeres, possibly slowing or 
reversing degenerative diseases. In one study’, 
mice genetically engineered to lack functional 
telomerase showed brain degeneration and 
shrunken testes, but those effects were reversed 
when the enzyme was reactivated. 

Such findings have sparked a lot of hype 
and encouraged a cottage industry of compa- 
nies that assess a person's ‘biological age’ on 
the basis of their telomere length. But TA Sci- 
ences has taken the buzz further: it sells a pill 
called TA-65, which it says can lengthen short 
telomeres. The pill brings in an annual revenue 
of US$6 million in the United States alone. 

“A compound that can lengthen telomeres 
would be excellent,” says Carol Greider, a 
molecular biologist at Johns Hopkins Uni- 
versity in Baltimore, Maryland, who shared a 
Nobel prize for her work on how telomerase 
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protects chromosomes. But, she adds, “we 
would need to test it rigorously”. 

The active ingredient of TA-65 was isolated 
from the herb Astragalus membranaceus and 
patented by Geron, a biopharmaceutical firm 
in Menlo Park, California. Research spon- 
sored by TA Sciences and other companies 
has shown that the compound can lengthen 
telomeres in mice” and humans’, but Greider 
and others are sceptical of the assay used. 

Calvin Harley, president of Telome Health in 
Menlo Park, spearheaded the studies as chief 
scientific officer at Geron. He stands by the 
conclusion that TA-65 is a “weak telomerase 
activator”. However, TA Sciences sells the pill 
as a nutritional supplement, or ‘nutraceutical, 
rather than a drug, so the firm's health claims 
have not been evaluated by the US Food and 
Drug Administration (FDA). 


DIFFICULT RELATIONSHIP 

In May 2011, Patton hired Egan to help to 
expand TA Sciences’ reach in foreign mar- 
kets. Egan was required to take TA-65 twice 
a day, he later wrote in a discrimination com- 
plaint, so “that I could tell customers that I 
was also taking the product, and that it was 
safe and effective”. Patton denies that it was 
obligatory. 

On 14 September, Egan says, he told Pat- 
ton that he had been diagnosed with prostate 
cancer. The next day, according to Egan, Pat- 
ton fired him and said that his prostate cancer 
could ruin the company. Egan says that when 
he was fired, he was offered a cash settlement 
to keep quiet about his cancer, but turned it 
down. 

Patton denies Egan’s version of events. 
According to an affidavit that Patton filed in 
the discrimination suit, Egan was fired for 
meagre sales. On being told of his dismissal, 


Patton alleges in the affidavit, Egan threw his 
keys at his boss and demanded to settle things 
“man to man” An employee broke up the con- 
frontation, and Egan stormed out, says Patton. 
Egan denies Patton's version of events. 

On 19 September, Egan told a potential 
TA Sciences partner in Spain that he had 
developed cancer while taking TA-65. Patton 
and TA Sciences sued Egan for defamation in 
March, saying that he had lost the company 
$2 million in sales. Patton says that TA Sci- 

ences believes that 


“A compound if Egan had cancer, 
that can he had it before he 
len gthen started taking TA-65. 
telomereswould — ©gan stands by 
be excellent. his allegations, and 
But we would has now launched a 
need to test it broader attack on the 


company’s science 
with his class-action 
suit, which he filed on 
23 July, in the New York State supreme court, 
along with another man who took TA-65. The 
suit challenges statements on TA Sciences’ 
website, including the assertion that TA-65 
can lengthen short telomeres. 

Greider doubts that TA-65 caused Egan’s 
cancer, but agrees that the science behind it 
is murky. A telomere-lengthening compound 
would be a boon to patients dying of bone- 
marrow failure and pulmonary fibrosis, she 
says, and firms could be expected to explore 
its pharmaceutical potential. “I don’t think a 
company would be selling it on the side as a 
nutraceutical,’ she says. m 


rigorously.” 


1. Jaskelioff, M. et a/. Nature 469, 102-106 (2011). 

2. Bernardes de Jesus, B. et al. Aging Cell 10, 
604-621 (2011). 

3. Harley, C. B. et al. Rejuvenation Res. 14, 45-56 
(2011). 
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The Curiosity rover prepares 
to plunge down to Mars. 


Curiosity 


BY ERIC HAND 


fter an eight-month journey 

to Mars, success for NASA's 

Curiosity rover will hinge on a 

few crucial moments. The larg- 

est and most complicated piece 
of machinery ever sent to the red planet, Curi- 
osity will begin its seven-minute fall through 
the wispy atmosphere at 05:24 uTc on 6 August. 
On Earth, mission scientists will be unable to 
do anything but wait and hope for the signal 
that the six-wheeled remote laboratory is rest- 
ing safely in the feeble Martian sunlight. 

If Curiosity lands successfully in Gale Crater, 
it will eventually trundle over to a 5.5-kilo- 
metre-tall stack of layered deposits ringed by %) Flyaway 
water-altered minerals. Ascending the mound, 
the rover will chart hundreds of millions of 
years of geology and help researchers to deduce 
whether life could ever have existed on Mars. 

But first it has to arrive. On its way down, the 
spacecraft will fire 76 charges, adopt 6 configu- 
rations and slow from 6 kilometres per second 
to a standstill. It will be the first craft since the 
Apollo Moon programme of the 1960s and 
1970s to use a guided-entry system, and the 
final leg of the descent will mark the first use of 
a ‘sky crane’ At 900 kilograms, Curiosity is too 
heavy to land in airbags like earlier rovers, and 
retrorockets like those used in the Viking Mars 
landings of the 1970s would kick up damaging 
dust. Instead, a hovering platform will unspool 
the rover. “All sorts of things can go wrong,’ said 
NASA administrator Charles Bolden at a meet- 


Thanks to the guided-entry system, 

lliptical area in which Curiosity is 
to set down is orders of 
Je smaller than those for 
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rover. The 
final touchdown is the second-riskiest 


The parachute, nearly 16 metres 


the models have assessed every possible vagary 
of environment and machine. What’s worri- 
some are the unknown unknowns, says Steven ox 
Lee, the mission's guidance, navigation and con- 


the parachute will experience in 


a 
ae : < 
ing of the NASA Advisory Council on 25 July. "3 as expelled by a 5 part of the landing, after the 
“That’s what makes it a real nail-biter:” Of the 1.7% 2 san Fe ae g parachute deployment. The 0.7% risk 
é ; S70 , & is divided between different terrain 
NASA officials told the Jet Propulsion Labo- ROVER EVOLUTION about 1% is due to potential = hazards. Rocks and slopes could in 
ratory (JPL) in Pasadena, California, to ensure The size ofa small car at almost one tonne, Curiosity is problems with the parachute, = rare cases flip the rover over, or 
a 95% chance of landing success. Engineers say 5 times heavier than the Spirit and Opportunity rovers gue a Sia lines ry hye E Curiosity could land in a crater too 
: _ and 56 times heavier than Sojourner. Unlike its PSGIIAVONS Shel Dal aC i tesoc ae m = = deep to escape or on a mesa too 
they have surpassed that: the current assess Solar-panelled prédebess6rs, chien OMiyi™ strays little from that used in the 4 ) steep to descend. 
ment, based on millions of simulations, finds radioisotope thermoelectric generator. successful Viking landings, but =! 
Oo, + : : there have been only limited tests 35, — 
only a 1.7% risk of failure. But that holds only if atthe sqesdaeieaeactles fie g oe... ate > 
ft 
9 


oe MORE 
i \, ONLINE 


trol-systems manager at the JPL. “Probably the Fc ples Sojourges 4 
2012 1997 
overall biggest risk is our lack of imagination” = Se oes Sarai CF es NATURE.COM/CURIOSITY 


16 | NATURE | VOL 488 | 2 AUGUST 2012 2 AUG UB 201) ("Gr Ee sicnr Mayr UR Ee fer 


SOURCE:ARXIV.ORG 


BIOLOGY OPENS UP 


Quantitative-biology papers are a small but rising fraction of 


submissions to the ArXiv preprint database. 
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Geneticists eye the 
potential of arXiv 


Population biologists turn to pre-publication server to gain 
wider readership and rapid review of results. 


BY EWEN CALLAWAY 


he preprint server arXiv.org is perhaps 

| best known as the preserve of theoretical 
physicists and astrophysicists. But 2008 

saw an influx of submissions of unpublished 
manuscripts, or preprints, by condensed-matter 
physicists who wanted to stake claims to the 
fast-moving subject of iron-based superconduc- 
tors called pnictides. Now the life sciences may 
be on the cusp of their own ‘pnictide moment, 
with population geneticists leading the charge. 

In the past month, leading research groups 
have posted to arXiv high-profile papers on 
the genetic history of southern Africans’ and 
Europeans’. Other prominent population 
geneticists have submitted methods-based 
papers to the server, which is hosted by Cornell 
University in Ithaca, New York. The number 
of biology papers on the server is still small in 
comparison with physical-sciences preprints 
(see ‘Biology opens up’), but Paul Ginsparg, a 
theoretical physicist at Cornell who founded 
arXiv in 1991 (ref. 3), welcomes what he hopes 
could be a sea change. 

“It's wonderful if biologists are belatedly 
joining the late twentieth century,’ he quips. 
“Welcome to the party; better late than never.” 

Life-sciences papers have existed on arXiv 
almost since its inception, but biologists have 
typically shied away from this approach amid 
fears of getting scooped, or of offending jour- 
nals. It was only in 2003 that the site inau- 
gurated a section specifically for papers on 
quantitative biology, or ‘q-bio for short. Yet 


papers posted to the section in the past tended 
to report esoteric models and methods, often 
from physical scientists dabbling in biology. 

Rich in mathematics and steeped in the open- 
data traditions of genomics, population genet- 
ics would seem the ideal candidate to dip its toes 
into pre-publication, which brings the advan- 
tages of speed and open discussion. “I grew 
up in the physics community,” says Richard 
Neher, a population geneticist at the Max 
Planck Institute for Developmental Biology in 
Tubingen, Germany, “and putting things up on 
arXiv is a natural thing for me to do” Neher 
has co-authored more than ten submissions to 
arXiv since 2004. 

Despite such examples, one of the best- 
known life-sciences preprints on arXiv comes 
from microbiology, not population genetics. 
Posted as a rebuttal to a 2011 Science paper 
reporting that a strain of Halomonas bacteria 
from a Californian lake could incorporate arse- 
nic into its DNA“, the preprint appeared on the 
server in January” before its publication in Sci- 
ence this month*®. Yet a population geneticist still 
played a part — the paper’s co-author, Leonid 
Kruglyak, of Princeton University in New Jersey 
and the Howard Hughes Medical Institute, 
works in the field. Kruglyak says he will con- 
sider arXiv for future studies from his lab. 

Another attention-grabbing submission 
by prominent geneticists, posted on 23 July, 
compares genomic variation in 22 African 
populations to suggest an ancient genetic link 
between people in southern and eastern Africa’. 
One of the paper’s senior authors, geneticist 
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David Reich of Harvard Medical School in 
Boston, Massachusetts, publishes routinely in 
Nature and the Public Library of Science jour- 
nals, and co-author Carlos Bustamante, of 
Stanford University School of Medicine in Cali- 
fornia, is a leader in the field. Reich says that first 
author Joseph Pickrell, also at Harvard Medical 
School, suggested using arXiv. Reich and the 
other co-authors saw no good reason not to post 
the manuscript there. “It could be an example of 
the younger generation coming in and finding 
this sort of thing natural,” says Ginsparg. 

For most life scientists, however, pre-publi- 
cation is still “more of a trickle” than a trend, 
says statistical geneticist Graham Coop of 
the University of California, Davis. He and 
his postdoc, statistician Peter Ralph, posted a 
paper on 16 July analysing genetic relatedness 
among neighbouring European populations, 
and Coop remains bullish about arXiv’s poten- 
tial. “Biology will soon have to embrace this 
trend fully: the speed of discussion, comment 
and pre-publication review allowed is needed 
in biology more than most fields,” he says. 

Yet some population geneticists still feel that 
those posting to arXiv are sticking their necks 
out. Some biology journals, such as those pub- 
lished by the Ecological Society of America in 
Washington DC, for example, expressly pro- 
hibit pre-publication in citable public archives. 
And there are the concerns over establishing 
who was first to a discovery. 

But Ginsparg says that pre-publication 
is more likely to stop scientists from being 
scooped. In many physics fields, publication on 
arXiv is what counts for claiming priority, and 
journal reviewers can use the server to check 
that discoveries are correctly attributed. An 
authoring history that accompanies all arXiv 
papers also allows scientists to arbitrate dis- 
putes over priority. In the 21 years since arXiv 
began, Ginsparg has seen astrophysicists, com- 
puter scientists and others go from sceptics to 
devotees. “Once a community adopts arXiv, it 
never seems to relinquish it,” he says. m 


1. Pickrell, J. K. et al. Preprint at http://arxiv.org/ 
abs/1207.5552 (2012). 

2. Ralph, P. & Coop, G. Preprint at http://arxiv.org/ 
abs/1207.3815 (2012). 

Ginsparg, P. Nature 476, 145-147 (2011). 
Wolfe-Simon, F. et al. Science 332, 1163-1166 (2011). 
Reaves, M. L. et al. Preprint at http://arxiv.org/ 
abs/1201.6643 (2012). 

6. Reaves, M. L. et al. Science 337, 470-473 (2012). 
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CORRECTIONS 

The graph in the News story ‘Gene data to 
hit milestone’ (Nature 487, 282-283; 2012) 
miscounted data sets in ArrayExpress for 
the years 2003-11. The corrected graph 
can be seen online at go.nature.com/2wrlpx. 


The News Feature ‘Beta test’ (Nature 487, 
160-162; 2012) gave the wrong affiliation 
for Stefan Schonert. He is at the Technical 
University Munich. 
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The motivating issues vary, but episodes 
of violent political upheaval in the United 
States are surprisingly regular. 


Racial, class and political 
tensions peaked during and 
after the Civil War. 
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of the First World War. 
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ISTORY AS SCIENC 


Advocates of ‘cliodynamics’ say that they can use 
scientific methods to illuminate the past. But historians 


are not So sure. 


BY LAURA SPINNEY 


ometimes, history really does seem to 
repeat itself. After the US Civil War, for 
example, a wave of urban violence fuelled 
by ethnic and class resentment swept across 
the country, peaking in about 1870. Internal 
strife spiked again in around 1920, when race 
riots, workers’ strikes and a surge of anti- 
Communist feeling led many people to think 
that revolution was imminent. And in around 
1970, unrest crested once more, with violent stu- 
dent demonstrations, political assassinations, 
riots and terrorism (see ‘Cycles of violence’). 
To Peter Turchin, who studies population 
dynamics at the University of Connecticut in 
Storrs, the appearance of three peaks of politi- 
cal instability at roughly 50-year intervals is not 
a coincidence. For the past 15 years, Turchin 
has been taking the mathematical techniques 
that once allowed him to track predator-prey 
cycles in forest ecosystems, and applying them 
to human history. He has analysed historical 
records on economic activity, demographic 
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trends and outbursts of violence in the United 
States, and has come to the conclusion that 
a new wave of internal strife is already on its 
way’. The peak should occur in about 2020, 
he says, and will probably be at least as high as 
the one in around 1970. “I hope it won't be as 
bad as 1870, he adds. 

Turchin’s approach — which he calls clio- 
dynamics after Clio, the ancient Greek muse of 
history — is part of a groundswell of efforts to 
apply scientific methods to history by identify- 
ing and modelling the broad social forces that 
Turchin and his colleagues say shape all human 
societies. It is an attempt to show that “history 
is not ‘just one damn thing after another”, says 
Turchin, paraphrasing a saying often attributed 
to the late British historian Arnold Toynbee. 

Cliodynamics is viewed with deep scepti- 
cism by most academic historians, who tend 
to see history as a complex stew of chance, 
individual foibles and one-of-a-kind situa- 
tions that no broad-brush ‘science of history’ 
will ever capture. “After a century of grand 
theory, from Marxism and social Darwinism to 
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structuralism and postmodernism, 
most historians have abandoned 
the beliefin general laws,’ said Rob- 
ert Darnton, a cultural historian at 
Harvard University in Cambridge, 
Massachusetts, in a column written 
in 1999, 

Most think that phenomena 
such as political instability should 
be understood by constructing 
detailed narratives of what actu- 
ally happened — always looking 
for patterns and regularities, but 
never forgetting that each out- 
break emerged from a particular 
time and place. “We're doing what 
can be done, as opposed to aspiring 
after what can't,” says Daniel Szechi, 
who studies early-modern history 
at the University of Manchester, 
UK. “We're just too ignorant” to 
identify meaningful cycles, he adds. 

But Turchin and his allies con- 
tend that the time is ripe to revisit 
general laws, thanks to tools such 
as nonlinear mathematics, simu- 
lations that can model the inter- 
actions of thousands or millions of 
individuals at once, and informat- 
ics technologies for gathering and 
analysing huge databases of his- 
torical information. And for some 
academics, at least, cliodynamics 
can’t come a moment too soon. 
“Historians need to abandon the 
habit of thinking that it’s enough 
to informally point to a sample of 
cases and to claim that observations 
generalize,’ says Joseph Bulbulia, 
who studies the evolution of reli- 
gion at Victoria University of Wel- 
lington in New Zealand. 


FROM ECOLOGY TO HISTORY 
Turchin conceived cliodynamics 
during what he jokingly calls a 
midlife crisis: it was 1997, he was 
40 years old, and he had come to 
feel that all the major ecological questions about population dynamics 
had been answered. History seemed to be the next frontier — perhaps 
because his father, the Russian computer scientist Valentin Turchin, had 
also wondered about the existence of general laws governing societies. 
(The elder Turchin’s dissident writings about the origins of totalitarian- 
ism were among the reasons that the Soviet Union exiled him in 1977, 
after which he moved his family to the United States.) 

What is new about cliodynamics isn't the search for patterns, Turchin 
explains. Historians have done valuable work correlating phenomena 
such as political instability with political, economic and demographic 
variables. What is different is the scale — Turchin and his colleagues 
are systematically collecting historical data that span centuries or even 
millennia — and the mathematical analysis of how the variables interact. 

In their analysis of long-term social trends, advocates of cliodynam- 
ics focus on four main variables: population numbers, social structure, 
state strength and political instability. Each variable is measured in sev- 
eral ways. Social structure, for example, relies on factors such as health 
inequality — measured using proxies including quantitative data on life 
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Periods of rioting and upheaval 
have recurred roughly every 50 
years in US history. 


expectancies — and wealth ine- 
quality, measured by the ratio of 
the largest fortune to the median 
wage. Choosing appropriate prox- 
ies can bea challenge, because rel- 
evant data are often hard to find. 
No proxy is perfect, the researchers 
concede. But they try to minimize 
the problem by choosing at least 
two proxies for each variable. 
Then, drawing on all the sources 
they can find — historical data- 
bases, newspaper archives, ethno- 
graphic studies — Turchin and his 
colleagues plot these proxies over 
time and look for trends, hop- 
ing to identify historical patterns 
and markers of future events. For 
example, it seems that indicators 
of corruption increase and politi- 
cal cooperation unravels when a 
period of instability or violence 
is imminent. Such analysis also 
allows the researchers to track the 
order in which the changes occur, 
so that they can tease out useful 
correlations that might lead to 
cause-effect explanations. 


ENDLESS CYCLES 

When Turchin refined the con- 
cept of cliodynamics with two col- 
leagues — Sergey Nefedov of the 
Institute of History and Archaeol- 
ogy in Yekaterinburg, Russia, and 
Andrey Korotayev of the Russian 
State University for the Humani- 
ties in Moscow — the researchers 
found that two trends dominate 
the data on political instability. 
The first, which they call the secu- 
lar cycle, extends over two to three 
centuries. It starts with a relatively 
egalitarian society, in which supply 
and demand for labour roughly balance out. In time, the population 
grows, labour supply outstrips demand, elites form and the living stand- 
ards of the poorest fall. At a certain point, the society becomes top-heavy 
with elites, who start fighting for power. Political instability ensues and 
leads to collapse, and the cycle begins again. 

Superimposed on that secular trend, the researchers observe a shorter 
cycle that spans 50 years — roughly two generations. Turchin calls this 
the fathers-and-sons cycle: the father responds violently to a perceived 
social injustice; the son lives with the miserable legacy of the resulting 
conflict and abstains; the third generation begins again. Turchin likens 
this cycle to a forest fire that ignites and burns out, until a sufficient 
amount of underbrush accumulates and the cycle recommences. 

These two interacting cycles, he says, fit patterns of instability across 
Europe and Asia from the fifth century Bc onwards. Together, they 
describe the bumpy transition of the Roman Republic to the Roman 
Empire in the first century Bc. He sees the same patterns in ancient 
Egypt, China and Russia, and says that they explain the timing of last 
year’s Egyptian uprising, which took the regime of then-president 
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Hosni Mubarak by surprise. At the time, the Egyptian economy was 
growing and poverty levels were among the lowest in the developing 
world, so the regime could reasonably have expected stability. In the 
decade leading up to the revolution, however, the country saw a quadru- 
pling of graduates with no prospects — a marker of elite overproduction 
and hence, Turchin argues, trouble. 

Turchin has also applied this approach to other historical puzzles, 
such as how religions grow. Several models have been proposed. One is 
that they grow in a linear fashion as nonbelievers spontaneously ‘see the 
light. Another model holds that the number of converts increases expo- 
nentially, like infections with a contagious disease, as outsiders come 
into contact with growing numbers of converts. Using several independ- 
ent proxies, Turchin has mapped conversions to Islam in medieval Iran 
and Spain, and found that the data fit the contagion model most closely’. 
Using the same techniques, he has also shown that the model describes 
the expansion of Christianity in the first century ap, and of Mormonism 
since the Second World War. 

Claudio Cioffi-Revilla, a computer social scientist at George Mason 
University in Fairfax, Virginia, welcomes cliodynamics as a natural 
complement to his own field: doing simulations using ‘agent-based’ 
computer models. Cioffi-Revilla and his team are developing one such 
model to capture the effects of modern-day climate change on the Rift 
Valley region in East Africa, a populous area that is in the grip of a 
drought. The model starts with a series of digital agents representing 
households and allows them to interact, following rules such as sea- 
sonal migration patterns and ethnic alliances. The researchers have 
already seen labour specialization and vulnerability to drought emerge 
spontaneously, and they hope eventually to be able to predict flows of 
refugees and identify potential conflict hotspots. Cioffi-Revilla says that 
cliodynamics could strengthen the model by providing the agents with 
rules extracted from historical data. 


GLOBAL TRENDS 

Cliodynamics has another ally in Jack Goldstone, director of the Center 
for Global Policy at George Mason University and a member of the 
Political Instability Task Force, which is funded by the US Central Intel- 
ligence Agency to forecast events outside the United States. Goldstone 
has searched for cliodynamic patterns in past revolutions, and predicts 
that Egypt will face a few more years of struggle between radicals and 
moderates and 5-10 years of institution-building before it can regain 
stability. “It is possible but rare for revolutions to resolve rapidly,’ he 
says. “Average time to build a new state is around a dozen years, and 
many take longer.” 

But Goldstone cautions that cliodynamics is useful only for looking 
at broad trends. “For some aspects of history, a scientific or cliodynamic 
approach is suitable, natural and fruitful” he says. For example, “when 
we map the frequency versus magnitude of an event — deaths in vari- 
ous battles in a war, casualties in natural disasters, years to rebuild a 
state — we find that there is a consistent pattern of higher frequencies 
at low magnitudes, and lower frequencies at high magnitudes, that fol- 
lows a precise mathematical formula.’ But when it comes to predicting 
unique events such as the Industrial Revolution, or the biography of 
a specific individual such as Benjamin Franklin, he says, the conven- 
tional historian’s approach of assembling a narrative based on evidence 
is still best. 

Herbert Gintis, a retired economist who is still actively researching 
the evolution of social complexity at the University of Massachusetts 
Amherst, also doubts that cliodynamics can predict specific historical 
events. But he thinks that the patterns and causal connections that it 
reveals can teach policy-makers valuable lessons about pitfalls to avoid, 
and actions that might forestall trouble. He offers the analogy of avia- 
tion: “You certainly can't predict when a plane is going to crash, but 
engineers recover the black box. They study it carefully, they find out 
why the plane crashed, and that’s why so many fewer planes crash today 
than used to.” 

None of these arguments, however, has done much to soften 
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scepticism among historians in general. The essential weakness of any 
attempt to make predictions based on trends, says Szechi, is the appall- 
ing patchiness of historical information. Records can be preserved or 
destroyed by chance: in 1922, for example, fighting in the Four Courts 
area of Dublin during the Irish Civil War led to a fire that destroyed the 
country’s entire medieval archive. More generally, says Szechi, knowl- 
edge tends to pool around narrow subject areas. “We can tell you in great 
detail what the grain prices were in a few towns in southern England 
in the Middle Ages,” he says. “But we can't tell you how most ordinary 
people lived their lives” 

Concerted efforts are now under way to fill those holes. Harvey 
Whitehouse, an anthropologist at the University of Oxford, UK, is over- 
seeing the construction ofa database of information about rituals, social 
structure and conflict around the globe since records began. It is a huge 
undertaking, involving historians, archaeologists, religious scholars, 
social scientists and even neuroscientists, and it will take decades to 

complete — assuming that funding can be found beyond the 
66 UK government's current 5-year commitment. But White- 
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feeding the database 
will complement 
Turchin’s approach by 
throwing light on the 
immediate triggers of 
political violence. He 
argues’, for example, 
that for such violence to happen, individuals must begin to identify 
strongly with a political group. One powerful way for groups to cement 
that identification is through rituals, especially frightening, painful or 
otherwise emotional ones that create a body of vivid, shared memories. 

“People form the impression that the most profound insights they 
have into their own personal history are shared by other people,” says 
Whitehouse, who explored this fusion of identities in an as-yet unpub- 
lished survey of revolutionary brigades in Misrata, Libya, last December, 
along with his colleague Brian McQuinn, an anthropologist at Oxford 
who studies civil wars. Only once such fusion has occurred do people 
become willing to fight and die for the group, he says. Therefore, if 
Turchin’s prediction of unrest in the United States around 2020 is cor- 
rect, Whitehouse would expect the next few years to see an increase 
in tightly knit US groups whose rituals have a threatening quality but 
promise great rewards. 

Turchin can’t say who those groups might be, what cause they will 
be fighting for or what form the violence will take. Previous bouts of 
turbulence were not dominated by any one issue, he says. But he already 
sees the warning signs of social strife, including a surplus of graduates 
and increasing inequality. “Inequality is almost always a bad thing for 
societies,” he says. 

That said, Turchin insists that the violence is no more inevitable than 
an outbreak of measles. Just as an epidemic can be averted by an effective 
vaccine, violence can be prevented if society is prepared to learn from 
history — ifthe US government creates more jobs for graduates, say, or 
acts decisively to reduce inequality. 

But perhaps revolution is the best, if not the only, remedy for severe 
social stresses. Gintis points out that he is old enough to have taken 
part in the most recent period of turbulence in the United States, which 
helped to secure civil rights for women and black people. Elites have 
been known to give power back to the majority, he says, but only under 
duress, to help restore order after a period of turmoil. “I'm not afraid of 
uprisings,” he says. “That’s why we are where we are.” m 


GENERAL LAWS.” 


Laura Spinney is a freelance writer in Lausanne, Switzerland. 
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Physical scientists protested against funding reforms with a mock funeral for British science. 


Duel to the dea 


Physicists, chemists and mathematicians in the 
United Kingdom are furious about funding reforms 
that they say threaten blue-skies research. 


r | Vhe horse-drawn Victorian 
hearse canters past Lon- 
don’s Houses of Parliament and round Parlia- 

ment Square before trotting smartly down Whitehall. 

Straggling behind it comes an eclectic mix of scientists, 

ranging from students with body piercings to tweed- 

jacketed professors. The spectacle makes an odd addi- 
tion to the spring afternoon traffic, bringing to mind 
an elegant, if unusual, state funeral. 

The truth, however, is much stranger. “We're protest- 
ing,’ explains one PhD student to a puzzled tourist, 
“against our research funder.’ 

That funder is the Engineering and Physical Sci- 
ences Research Council (EPSRC), the government 
body that holds the biggest public purse for phys- 
ics, mathematics and engineering research in the 
United Kingdom. Facing a growing cash squeeze 
and pressure from the government to demonstrate 
the economic benefits of research, in 2009 the coun- 
cil’s chief executive, David Delpy, embarked on 
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a series of controversial reforms. 
Some were intended to cut the 
overwhelming number of grant proposals that the 
EPSRC receives, by limiting resubmissions and tem- 
porarily blocking people who submitted too many 
unsuccessful applications from sending in more. A 
second set of reforms included requirements that 
grant applicants explain how their research might 
generate economic or other benefits, as well as a vig- 
orous overhaul of the EPSRC’s research portfolio. 
The changes incensed many physical scientists, 
who protested that the policy to blacklist grant 
applicants was draconian. They complained that 
the EPSRC’s decision to exert more control over 
the fields it funds risked sidelining peer review and 
would favour short-term, applied research over curi- 
osity-driven, blue-skies work in a way that would be 
detrimental to British science. The souring relation- 
ship between the EPSRC and parts of its constituency 
reached a conspicuously public nadir in May, when 
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disaffected researchers launched the ‘Sci- 
ence for the Future’ campaign with the hearse 
stunt, which ended by delivering the coffin, 
signifying the death of British science, and a 
petition demanding the “immediate reform 
of the EPSRC’s policies” to the prime 
minister in Downing Street. In a letter 
to The Daily Telegraph newspaper in 
support of the protestors, nine Nobel 
laureates in the United Kingdom and 
United States accused the EPSRC of 
“manipulating the process of peer 
review” and “establishing favouritism 
schemes”. 

The battle could be played out else- 
where. Many government funding 
bodies are facing diminishing budg- 
ets in the wake of the global financial 
crisis, and increasing pressure from 
politicians to show that the research 
they are funding will contribute to 
economic growth. “This is a challenge 
for research funding agencies across 
the globe,” says Julia Lane, an expert 
in science policy who formerly worked 
at the National Science Foundation 
(NSF) in Arlington, Virginia, and is 
now a senior managing economist at 
the American Institutes for Research 
in Washington DC. “And they’re all 
struggling to provide enough informa- 
tion for policy-makers so that they can keep 
funding going for basic research.” 

In Britain, it looks likely that researchers will 
have to live with this new reality: Delpy and his 
supporters say that the policies are an unavoid- 
able reaction to a slumping budget and that 
there will be no U-turns. According to Delpy’s 
detractors, however, the EPSRC provides a 
lesson in how not to implement such reforms. 
“They shot themselves in the foot by alienat- 
ing the community that they’re here to serve,’ 
says Paul Clarke, a synthetic organic chemist at 
the University of York, and one of the EPSRC’s 
most vociferous critics. 


Grant applications 
in physical sciences 


EPSRC budget (£ millions) 
adjusted for inflation 


OPENING ROUND 

The tensions started rising in 2007, shortly 
after Delpy, a physicist by training, left his 
post managing the research portfolio at Uni- 
versity College London to take command of 
the EPSRC and its research budget of some 
£800 million (US$1.3 billion). Delpy faced a 
problem: an overwhelming number of grant 
applications and a flat budget were starting to 
push up rejections. In 2008, the success rate 
for applications dropped from a typical 30% 
to 26% overall and even lower in some fields 
(see ‘Physics of funding’). In March 2009, the 
EPSRC announced that from the following 
month researchers could no longer resub- 
mit grant proposals that had been rejected, 
a policy that has since been implemented at 
several other UK research councils. At around 
the same time, it also barred researchers 
with a record of rejections from sending 


in any further funding applications for 
12 months — a policy that initially hit more 
than 200 people. 

The changes outraged physical scientists. 
Within 2 weeks of the blacklisting policy 


PHYSICS OF FUNDING 


The UK’s main physical-science funder introduced policies 
to cut grant submissions in 2009 (top), partly in response to 
a declining budget (bottom). 
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being announced, more than 1,200 people 
had signed an online petition demanding that 
it be rescinded (see Nature 458, 391; 2009). 
In May 2009, the EPSRC was forced to water 
down the policy by delaying its introduc- 
tion to April 2010, and allowing researchers 
to apply for one grant during the 12-month 
‘cooling-off’ period. The EPSRC says that 
only ten researchers are currently blacklisted. 

Clarke is one of those ten. Six years ago he 
concurrently held three EPSRC grants — more 
than any other researcher at the same career 
stage. But in January, after submitting a third 
unsuccessful proposal, he was told that he 
had been automatically barred from sending 
in more. Clarke, who works on the chemi- 
cal origins of life and on synthesizing natu- 
ral compounds, says that before the EPSRC 
banned resubmissions, he was able to address 
critiques of rejected proposals and send them 
back for re-review — after which they would 
often be successful. Rather than being a prob- 
lem, he views the large number of proposals 
as a sign of scientists with many good ideas, 
and advocates widening the pool of reviewers 
by mandating that those who submit EPSRC 
grants also review a certain number of propos- 
als each year. 

The EPSRC was also under growing pres- 
sure to demonstrate the impact of its invest- 
ment in research. This grew out of the 2006 
Warry report, Increasing the Economic Impact 
of Research Councils, as well as subsequent 
reports, demanding that all the UK research 
councils show that they are getting the most 
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bang for their buck. In 2009, the research 
councils started to require that funding appli- 
cants submit a two-page Pathways to Impact 
statement, summarizing how they intended 
to maximize the societal or economic ben- 
efits of a project — through commer- 
cialization of results, for example, or 
through public outreach. In Novem- 
ber 2011, the EPSRC added a ‘national 
importance’ criterion, requiring that 
researchers describe in a separate 
statement “the extent to which the 
research proposed has the potential, 
over 10-50 years, to meet national 
strategic needs”. 

Similar requirements are already 
routine at some other funding agen- 
cies, such as the NSE, which has asked 
applicants to explain their proposed 
project's ‘broader impacts’ on science 
and society since 1997. That require- 
ment has come in for some criticism 
(see Nature 475, 141; 2011) — but not 
the level of anger that the new request 
seemed to inspire in Britain’s physi- 
cal scientists. The impact statements, 
they said, showed that the EPSRC is 
inappropriately favouring short-term 
projects that will have economic ben- 
efits. “It is changing the fundamen- 
tal ethos of research to make it more 
responsive to the market,” says physicist Philip 
Moriarty of the University of Nottingham, an 
outspoken critic of the EPSRC reforms. And 
the national-importance criterion, say some, 
is simply asking for the impossible. “It's very 
hard to justify the economic importance of 
work that might not become applicable to real- 
world problems for decades,” says postdoctoral 
mathematician Will Merry at the University of 
Cambridge, UK. (Merry’s research problem, 
how to solve the motions of three or more 
interacting bodies, was first set out by Isaac 
Newton in 1687 — suggesting that science 
sometimes has to take the long road.) 

Martin Rees, former president of the Royal 
Society in London and a cosmologist at the 
University of Cambridge, says he worries that 
the new requirements could affect how young 
researchers apply for their first grant. “It’s 
going to make them slant their application in 
a way that might not be optimal from the point 
of view of the research,’ he says. The council 
should focus on making sure that the “bright- 
est people don’t get discouraged’, adds Rees, 
who says he finds the idea of asking research- 
ers to write about the potential future national 
importance of their work “absurd”. 

But the EPSRC had more reforms up its 
sleeve. In July 2011, it published the first of 
three phases in its Shaping Capability strategy, 
which divided the organization’s research port- 
folio into more than 100 fields, with the aim 
of maintaining or expanding areas of national 
importance and excellence, such as catalysis 
and energy storage, and shrinking others such 
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as mathematical physics and mobile comput- 
ing. The council revealed that future postdocs 
and other fellowships would be funded only 
in areas that were in line with the strategy; at 
the outset, for example, mathematics postdocs 
would be funded only in statistics and applied 
probability. 

To the critics, that decision was further 
evidence that the council ranked short-term 
pay-off above blue-skies research, in this case 
favouring the needs of the City, London’s large 
financial sector, which is hungry for statisti- 
cal expertise. The policy had an immediate 
impact for Merry, who finished a 12-month 
EPSRC doctoral-prize fellowship and strug- 
gled to find a postdoc at home. He is now 
set to start one at ETH Zurich in September. 
Merry says that all the maths PhD students 
he knows at the University of Cambridge are 
following a similar path. “The most visible 
outcome of the change in funding is that to the 
best of my knowledge they’re all going abroad 
next year,” he says. 

Some researchers also have a broader con- 
cern: that EPSRC administrators are taking 
over the role of working scientists in decid- 
ing how money should be spent and, as a 
consequence, are funding mediocre research 
in arbitrarily selected areas. Instead, they say, 
the EPSRC should focus on supporting the 
best-quality science in any area of its remit, 
as judged by peer review. “Non-scientists are 
making decisions that impact on the future 
spend of science money — and that is wrong,” 
says Tony Barrett, a synthetic organic chem- 
ist at Imperial College London and one of the 
organizers of the Science for the Future cam- 
paign. “They are not qualified to make those 
decisions” 

The uproar over impact and national impor- 
tance grew so loud that, in November 2011, 
the House of Lords Science and Technol- 
ogy Committee held a session to discuss the 
policies with Delpy and civil engineer John 
Armitt, then the council’s chair. The EPSRC 
also agreed to further discussions with the 
scientific community before rolling out the 
final phase of the Shaping Capability strategy, 
which it did in March 2012. The announce- 
ment, which filled in funding details for just 
over 50 fields, met a frosty, but somewhat less 
angry, reception. 


MAKING IMPACT 

Delpy stoutly defends his organization and 
its reforms. The policies to cut grant applica- 
tions have worked, he says: the success rates 
for applications are back up to around 30-35% 
— “a healthy level of competition”. 

He rejects the criticisms of the impact and 
national-importance strategies. The focus on 
economic impacts is nothing new, he says, 
nor does it come from his own experience in 
applied bioengineering. (He developed tech- 
niques for monitoring premature babies.) The 
1994 royal charter that officially established 
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In May, physical scientists called for research-council reform at the House of Commons in London. 


the EPSRC states that contributing to the 
country’s economic competitiveness is one of 
the body’s three main aims. The prominence 
placed on it in recent years is largely the result 
of government pressure, Delpy says. “You 
must realize that the term ‘economic impact’ 
was something that was imposed on us by 
the Treasury.’ What’s more, the EPSRC does 
not expect a precise forecast of what impacts 
a research proposal could have decades down 
the line, he says; rather, researchers are encour- 
aged to describe how any impacts that they can 
foresee might be speedily achieved. 

Delpy maintains that peer review is para- 
mount at the agency. The decision to limit 
the EPSRC’s maths postdocs to statistics 
was based on an independent international 
review of maths commissioned by the coun- 
cil and published in 2010, which singled out 
the field as an area of serious concern. “We 
decided we had to do something to get new 
blood into statistics,” he says. “This is not a 
response to government or the call of the City”” 
Mathematicians can turn to the Royal Society 
or the Leverhulme Trust in London for fund- 
ing, he points out. And although the research 
council’s budget is set to decline — by 6% in 
cash terms between 2010-11 and 2014-15 — 
he says that the proportion of funding going 
to discovery-led or blue-skies research has 
stayed roughly flat at around 50-60% of the 
total. “Where's the evidence the system is bro- 
ken?” he asks. 

Delpy does regret the rift that the reforms 
have opened up between the council and 
researchers. “Of course we could have done 
things better,” he concedes, particularly in 
communicating what the council was doing 
and why. “I would have liked to have been able 
to carry the community with me and have 
everyone feel that they were engaged.” At the 
same time, he says, “if you change any of the 
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ways of working you are going to create some 
degree of upset”. 

Britt Holbrook, a philosopher at the Univer- 
sity of North Texas in Denton, who special- 
izes in science policy, says that the EPSRC’s 
biggest mistake lay in banning resubmis- 
sions and blacklisting researchers. “It’s totally 
understandable that did not go down well,” 
he says. “If you stop people from resubmit- 
ting, that cuts out a large part of the value of 
peer review.’ But Holbrook, who co-authored 
an influential review of the NSF’s broader- 
impacts criterion, says that the critics are 
fighting a losing battle against the impact 
agenda. “People are working against the politi- 
cal and economic realities; he warns, and 
would be better off trying to shape the drive 
for impact, rather than block it. 

The hearse parked outside Downing Street, 
however, suggests that this advice is falling 
on deaf ears; the organizers of Science for 
the Future say that they are planning further 
protests over the summer. And Delpy’s critics 
have little time for his explanations, saying, for 
example, that they are sceptical of EPSRC fig- 
ures showing no decline in blue-skies research, 
given that the council defines the term itself. 
Clarke prefers to point to figures showing that 
in 2010-11, the EPSRC funded 151 proposals 
in the physical sciences, excluding engineer- 
ing, down from more than 500 in 2004-05. 
Yet there are more than 3,000 scientists in the 
United Kingdom who are eligible to apply for 
EPSRC funding. 

“People would rather not submit than 
submit, get blacklisted and hence be seen as 
a failing academic by their department,’ says 
Clarke. “There is now a culture of fear in aca- 
demic departments.” = 


Ananyo Bhattacharya is Nature’s chief online 
editor. 


SANG TAN/AP. 


NASA/SPL 


Exhibition A brief look at 
celebrates African 


views of the cosmos p.30 


Extraordinary tale of 
Francis Crick teaching Galileo 
consciousness theory p.29 


curious human behaviour, 
from tickling to burping p.31 


bats killed by wind-turbine 


Post mortems show 
blades, not air pressure p.32 


SAMPEX, the first of NASA’s inexpensive Small Explorer satellites in orbit, is due back to Earth this year. 


Let academia lead 
Space science 


NASA must put more of its money into thrifty missions 
led by principal investigators, says Daniel N. Baker. 


he Mars Curiosity rover, which all 
space scientists fervently hope will 
touch down on the red planet safely 
this week, is a prime example of an expen- 
sive and complicated NASA mission. With 
a landing scheme involving 76 pyrotechnic 
devices firing on time and a US$2.5-billion 
price tag, it is a high-risk endeavour. By 
contrast, the Mars Atmosphere and Volatile 
Evolution Mission (MAVEN) is a project 
being run out of our laboratory in Colorado 
to explore Mars’s upper atmosphere and 
ionosphere. It is set to launch in 2013 for 
about $500 million. It is on budget, on 
schedule and promises compelling science. 
Yet the Scout programme, under which 
such small Mars missions were funded, has 
recently been axed. 

The planetary exploration flagship pro- 
grammes and the vastly over-budget James 
Webb Space Telescope are symptomatic of 
a core problem in space research. Increas- 
ingly, NASA’s focus is on big projects that 
promise to return tremendous science 
benefits. But these programmes absorb 
most of the available funding for space 
research. They shift resources away from 
efficient and effective principal investigators 
(PIs) at universities, an approach in which a 
single person is responsible to NASA for the 
success of a mission, and towards bureau- 
cratic NASA centres. This is the wrong 
direction for space research, especially in a 
time of scarce funding. 

In my opinion, we need to turn civilian 
space-policy thinking on its head. Missions 
managed by PIs should be the highest priority 
for NASA, not the lowest. I am not talking 
about the ‘faster, better, cheaper’ approach 
of the 1990s, with skeleton crews of engi- 
neers at NASA centres. I am talking about 
missions led by university scientists with a 
real passion for research. This strategy would 
reduce budgetary overruns, increase the 
frequency of launches and enhance excite- 
ment like few other things could. 


THREE-WAY PARTNERSHIP 

At its beginning, the US civilian space 
programme was crafted as a three- 
way partnership between government 
(NASA), industry and academia. From 
the famous 1945 report of engineer and 
policy adviser Vannevar Bush, Science — 
The Endless Frontier, through to the 
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> unflinching commitment of NASAS second 
administrator, James Webb, the founding 
fathers of space research put universities at the 
centre of almost all NASA science activities. 

Since then, university researchers have 
brought innovation and nimbleness to hard- 
ware development, have exercised tender 
loving care of space instruments and have 
provided a necessary antidote to govern- 
ment stagnation. 

Universities have been a fertile training 
ground for thousands of space engineers 
and researchers, who have learned to be 
creative while sticking to budgets and 
schedules. This has been shown statistically 
in an analysis of historical data by David 
Bearden and his colleagues at the Aero- 
space Corporation in El Segundo, Califor- 
nia, due to be published in September. 

The central and indispensable role of 
universities in space work is now under 
immense stress as budgets tighten and 
NASA withdraws to its centres and core 
industry contractors. Most people who 
knew and understood the essential nature 
of the three-way space partnership are gone. 
Those who have replaced them in policy and 
leadership roles may not have realized or 
absorbed the lessons of the early days. 

University labs are being driven out of busi- 
ness. In the recent ‘Earth Venture’ mission 
selection, four and a half of the five concepts 
selected were for missions led by NASA 
centres. (The half comes from a collaboration 
between a NASA centre and a university.) 
Many space hardware- 


development groups “University 
that were thriving as labs are being 
recently as five years driven out of 


ago are now defunct. 
Those that remain 
are struggling. Students are finding fewer 
opportunities for experimentation and are 
not being trained to do things cheaply in a 
‘hands-on’ fashion. Space research has fallen 
into a vicious negative-feedback loop. 

For example, all 15 of the high-priority 
missions recommended for initiation by 
NASA in the 2007 US National Academies’ 
decadal survey, Earth Science and Applica- 
tions from Space, are being implemented 
by collaborations between centres and 
industry. Costs for the first set of missions 
have ballooned by factors of two to three. 
According to a mid-term assessment of the 
survey, in-sourcing work to centres and the 
use of ‘directed’ missions rather than com- 
petitive PI-class missions were among the 
reasons for rising costs. 

With insufficient funds, missions are 
being cancelled or delayed. As ageing 
spacecraft begin to fail, the United States is 
in grave danger of losing its ability to view 
Earth from space. Soon, it will be unable to 
provide decision-makers with the informa- 
tion they need to respond to natural hazards 


business.” 
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NASA's costly Curiosity rover will hopefully touch down on Mars to begin work this week. 


and to the ever-increasing pace of changes 
that are occurring in the atmosphere, oceans, 
land surface, cryosphere and ecosystems. 


THE RIGHT BALANCE 

Some things cannot be done in modest 
PI mode. A dedicated flagship mission is 
needed to develop, for instance, the multiple- 
instrument spacecraft necessary for going to 
the challenging environments of the outer 
planets. The same is true for large-aperture 
facility-class astrophysics programmes 
(including the James Webb Space Telescope). 

Iam notarguing that NASA should gut its 
centres. Before my present university posi- 
tion, I was a laboratory director at NASA's 
Goddard Space Flight Center in Greenbelt, 
Maryland, from 1987 to 1994. I saw first- 
hand what immense strengths could be 
mustered with a critical mass of engineering 
and science talent. In the current frenzy to 
cut federal budgets, there is a real danger of 
losing vital and unique capabilities at centres 
that have taken years to build and hone. 

There have already been staff reductions, 
such as at the Jet Propulsion Laboratory 
in Pasadena, California, and the planned 
budget cuts mean that more losses may be in 
the offing. We must not allow the navigation, 
propulsion and communication skills that 
enable space to be explored to slip through 
our fingers. If we do, we may never again be 
able to traverse the rings of Saturn, nor land 
onan enticing asteroid — nor one day plumb 
the depths of Europa’s oceans. 

But we must not allow centres to be 
maintained at exorbitant staffing levels 
irrespective of cost. Too many institutions 
employ workers who are performing routine 
and often-unnecessary functions. I have 
recently seen dozens of extra managers and 
engineers assigned to NASA programmes 
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just to give them accounts to charge to. Many 
centre-led missions are costly because they 
focus more on maintaining jobs than on get- 
ting the biggest scientific bang for the buck. 

There is ample support in the space- 
research community for a more balanced 
space programme. Many US National 
Research Council (NRC) reports and decadal 
surveys have made clear calls for more PI-led 
missions. And NASAs Explorer programme 
has used PIs to study focused space physics 
and astrophysics. Similarly, the Discovery and 
New Frontiers programmes in its Solar System 
exploration division are PI missions to study 
planetary-science issues of moderate scale. But 
the general trend is away from such missions. 

Planning groups at NASA should work 
with the NRC, the other National Acad- 
emies and the Office of Management and 
Budget to shift towards the kind of balanced 
programme that I advocate here. Allocating 
a few hundred million more dollars from 
NASA's $5-billion space-research budget 
to the PI end of the spectrum could work 
wonders, in my view. 

With government budgets tightening, 
space research should be revived in universi- 
ties because they are the best places to foster 
innovative thinking and to get science done 
in an affordable way. They are also where we 
must train the scientists and engineers who 
will bring an aggressive, nimble mindset to 
a brighter, future NASA. m 


Daniel N. Baker is professor of astrophysical 
and planetary sciences and director of the 
Laboratory for Atmospheric and Space 
Physics at the University of Colorado, Boulder, 
USA. He was chief of the Laboratory for 
Extraterrestrial Physics at NASA’ Goddard 
Space Flight Center from 1987 to 1994. 
e-mail: daniel. baker@lasp.colorado.edu 
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Galileo Galilei, pictured centre, after a painting by H. J. Detouche, is imagined in Giulio Tononi’s Phi as an explorer of consciousness. 


A quest for consciousness 


Christof Koch marvels at a journey that explains mind-body theory through a 


fantastical lens. 


ters. So writes Giulio Tononi, whose 

stunningly original scientific fantasy, 
Phi, is a distant echo of that great deduction 
by René Descartes. Tononi, a neuroscien- 
tist, psychiatrist and expert on sleep and 
consciousness, is also that rarest of modern 
scholars — an idealist. In this category-defy- 
ing book, he presents his quantitative theory 
of how brain produces mind as a voyage of 
discovery imagined for Galileo Galilei. 

In Tononi’s literary telling of this story, 
Francis Crick teaches Galileo basic neuro- 
science. Galileo learns that the brain is the 
seat of the mind, and that consciousness flees 
when neurons turn on and off together dur- 
ing deep sleep or seizures, as the pair meet 
scholars, scientists, doctors and artists from 
the Enlightenment to the modern era. The 
vast cast includes Descartes, Nicolaus Coper- 
nicus, Charles Darwin, Sigmund Freud, Mar- 
cel Proust and, eventually, Alan Turing. 


[: the end, consciousness is all that mat- 


Galileo negotiates §~ ———. 
some tricky concepts —— 
ona road long trodden : 
by neuroscientists and 


neurologists seeking \ | 
to track consciousness | 
down to its lair in the 
brain. Even if we could — 
point to this biophysi- 4 


Phi: A Voyage from 
the Brain to the 
Soul 

GIULIO TONONI 
Pantheon: 2012. 384 
pp. £19.99, $30 


cal mechanism, and 
those nerve cells, as 
mediators of the phe- 
nomenal experience 
of red, we would still 
need to ask — why 
these particular mechanisms and neurons? 
Why not others? Historically, the great chal- 
lenge has been to explain how conscious- 
ness emerges from highly organized matter 
without invoking magic, soul-stuff or exotic 
physics. 

With the advent of Claude Shannon’s 
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information theory in the twentieth century, 
scholars averred a link between information 
and conscious experience without work- 
ing out what that might be or could imply. 
Tononi’s integrated-information theory 
does both. Proceeding from two axioms 
that are rooted in everyday phenomenal 
experience, the theory defines a measure 
(the eponymous ®) that is associated with 
every system that consists of causally inter- 
acting parts. This measure is high ifa system 
constitutes a single entity above and beyond 
its parts (integration) and if it is endowed 
with a large repertoire of discriminable states 
(information). The more integrated infor- 
mation any system has, the more conscious 
it is. This framework, couched in a proba- 
bilistic language, also captures the unique 
intrinsic quality of experience — why blue, 
for example, is more similar to red than to 
pain or smell. 

In Phi, this is conveyed through a > 
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series of dazzling thought experi- 
ments aided by cameos from Shannon 
and philosophers Spinoza, Leibniz and 
Thomas Nagel (the only living person 
to figure in the book). Through them, 
Galileo understands how the algebra of 
integrated information is turned into the 
geometry of conscious experiences, and 
how this links to the physiology and the 
anatomy of the brain. 

In the book’s final third, Tononi lays 
out the implications of his theory. He 
discusses a number of points about con- 
sciousness: that it ceases in death and 
dementia, does not require language or 
knowledge of self, exists in animals in 
graded forms and can be present, to some 
degree, in the fetus. 

Hell, Tononi emphasizes, is all in the 
mind. One of the most chilling charac- 
ters in Phi is the Master, an amalgam of 
the captain in Franz Kafka’s 1914 short 

story In the Penal 
Colony and the 
Grand Inquisitor 
from Fyodor Dos- 
toyevsky’s novel 
The Brothers Kara- 
mazov (1880). The 
Master's obsession 
is creating perfect never-ending pain by 
manipulating the brain’s informational 
content. In the final chapter, the Man- 
nequin, a stand-in for Mephistopheles, 
throws up some logical paradoxes before 
leaving the dying Galileo reunited with 
his beloved daughter. 

Phi is extraordinary. In its appeal to the 
imagination, it bears some resemblance 
to Edwin Abbott’s Flatland novella (1884) 
or Douglas Hofstadter’s Gédel, Escher, 
Bach (Basic Books, 1979). Yet its language 
is more poetic, and full of cultural refer- 
ences and images — film stills and often 
modified coloured photos of artworks. 
Endnotes to each chapter link the alle- 
gories and metaphors in the text to the 
science. 

I believe that in the fullness of time, the 
quantitative framework outlined in Phi 
will prove to be correct. Consciousness is 
tightly linked to complexity and to infor- 
mation, with profound consequences for 
understanding our place in the evolving 
Universe. As Crick says to Galileo, this is 
a “story for grown men, not a consoling 
tale for children” = 


Christof Koch is chief scientific officer 
at the Allen Institute for Brain Science 

in Seattle, Washington, and professor of 
biology and engineering at the California 
Institute of Technology in Pasadena, 
California. 

e-mail: christofk@alleninstitute.org 
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Gavin Jantjes’ untitled painting depicts a Khoisan girl creating the Milky Way. 


ASTRONOMY 


Under African skies 


Ivan Semeniuk follows the gaze of artists from cultures 
that have interpreted the heavens for millennia. 


azing up ata sky full of stars is one of 
(S* most universal of human expe- 

riences, cutting across cultures and, 
one imagines, stretching back to the dawn of 
humanity. Yet artistic depictions of the heav- 
ens in popular culture are predominantly 
European — from Johann Bayer’s engrav- 
ings of the constellations in his 
1603 star atlas Uranometria 
to the swirling brilliance of 
Van Gogh's 1889 painting 
The Starry Night. 

An exhibition at the 
US National Museum of 
African Art, part of the 
Smithsonian Institution 
in Washington DC, may 
help to change that. It 
showcases a range of con- 
temporary and historical 
pieces by African artists. 
All are connected in one 
way or another to the Sun, 
Moon or stars. 

African Cosmos: Stel- 
lar Arts was sponsored in 
large part by the govern- 
ment of South Africa. 
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The country was selected this year, along 
with Australia, to host the Square Kilome- 
tre Array, which will be the world’s largest 
radio telescope; that association adds to the 
sense of interplay between the scientific and 
the spiritual that weaves its way through the 
exhibition. The show seamlessly bridges 
the centuries, uniting pieces as 
diverse as traditional moon 
masks from Céte d'Ivoire 
and Trembling Field, an 
interactive sculpture by 
South African Karel Nel. Nel 
is resident artist with the 
Cosmic Evolution Survey, 
a project that focuses on 
a two-square-degree field 
of the sky to see how the 
Universe has changed over 
time. 

“Africa has a long and 
rich history of keen observa- 
tion of the heavens,” says the 
exhibition's curator, Christine 
Mullen Kreamer. “Works 


Figures from Central Africa 
bear lunar patterns. 
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of art can allows us African Cosmos: 
access to that history, Stellar Arts 
and that knowledge.” National Museum of 


The journey begins ge A MSS ENO 


on territory that is (jnt9 December 
both ancient and 99]2 

familiar, with a series 

of pieces from pharaonic Egypt. Representa- 
tions of cosmic deities and celestial objects 
such as the bright star Sirius are reminders 
of the night sky’s prominent role in the ritu- 
als and beliefs of civilizations along the Nile. 
The exhibition goes on to leap across the 
Sahara and forward in time. Far more exotic 
to non-A frican eyes are items that date back 
only a century or so: a bowl and lid from 
Nigeria representing the domains of Earth 
and sky, or a Dogon stool from Mali, which 
depicts human ancestral figures descend- 
ing from the heavens to populate the land 
below. 

One of the more striking of the con- 
temporary works, an untitled painting by 
South African artist Gavin Jantjes, playfully 
reverses the theme of genesis from above. 
Based on a Khoisan myth from southern 
Africa, it depicts the story of a girl dancing 
around a fire. She throws glowing embers 
high into the night, thereby creating the 
Milky Way, the dominant feature of the 
southern sky. 

So might the creative sparks tossed sky- 
wards from such an exhibition serve to 
illuminate a continent’s worth of artistic 
achievement and potential. m 


Ivan Semeniuk is Nature’ chief of 
correspondents in Washington. 


Books in brief 


(- a Curious Behavior: Yawning, Laughing, Hiccupping, and Beyond 
me a | Robert R. Provine HARVARD UNIVERSITY PRESS 246 pp. £18.95 (2012) 
Phono’ How can farting, sneezing and other marginal biological realities 
: \ illuminate humanness? Neuroscientist Robert Provine turns 


an evolutionary lens on everything from the gross to the faintly 
improper. The ‘contagiousness’ of yawning, for instance, hints at 
the roots of empathy and herd behaviour. Burping and farting were 
involved in the development of speech, says Provine. And tickling 
may play a part in our early understanding that we are distinct 
beings (you can’t tickle yourself). An exercise in ‘small science’ — 
some of it speculative, all of it fascinating. 


— The Guardian of All Things: The Epic Story of Human Memory 
Michael S. Malone ST MARTIN’S PRESS 304 pp. £18.99 (2012) 
Memory is a kind of relay, with each generation passing its torch 
on to the next — creating a conduit for thought and civilization 
through the eons. In his evocative book, technology writer Michael 
Malone traces that history from the brain’s evolution and the 
development of speech and writing to advances in recording, the 
rise of technology and the shifts in ownership of memory from the 
tribal elect to the masses. The book is packed with gems, including 
a passage on the twelfth century, when Greek and Arabic science 
infused Europe, filling its libraries and helping to seed its universities. 


Dreamland: Adventures in the Strange Science of Sleep 

David K. Randall NORTON 304 pp. £17.99 (2012) 

Sleep occupies us for one-third of our lives. So insomnia, nightmares, 
deprivation and other aspects of bad sleeping are an obsession 

for thousands. The tipping point for journalist David Randall was 
sleepwalking into a wall. Astonished by a specialist's admission of 
ignorance about the condition, Randall set out to uncover research 
and shine a light into some dark corners. The entertaining result 
covers plenty of territory, from the medieval habit of dividing nightly 
sleeps to the link between vacuum cleaners and sleep apnea. Along 
the way, Randall picks up the basics for crafting a healthy snooze. 


Air: The Restless Shaper of the World 

William Bryant Logan NORTON 416 pp. £13.99 (2012) 
Arboriculturist William Bryant Logan follows Oak (Norton, 2005) 
and Dirt (Norton, 2007) with this splendid exploration of the 
“floating world” of air — our planet's invisible skin. Starting with 
the tornadoes that hit New York in 2010, he both warns of and 
celebrates the often turbulent and dangerous action of atmosphere. 
Logan delivers vast amounts of science with brevity and elegance, 
and is as breezy describing the billion tonnes of dust that blow 
from African deserts to fertilize the Amazon as he is discussing the 
echolocation skills of some people with sight impairment. 


The Big Muddy: An Environmental History of the Mississippi and 
Its Peoples, from Hernando de Soto to Hurricane Katrina 
Christopher Morris OXFORD UNIVERSITY PRESS 336 pp. £22.50 (2012) 
Seven years ago this month, Hurricane Katrina triggered massive 
flooding in the valley of the Mississippi. The environmental backstory 
of the catastrophe is as rich as river sediment, and historian 
Christopher Morris takes us through 500 years of it. The valley’s 
metamorphosis from vast wetland staked out by France and Spain to 
a patchwork of development — drained swamp, levees, deforestation, 
industry and poor urban planning — is powerfully recounted. 
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Budget cuts leave 
US science lagging 


As astudent of science, ’[m 
thrilled that scientists at CERN, 
Europe's particle-physics lab, 
have proved the existence of 
the Higgs boson and advanced 
our understanding of the 
Universe. But as an American, 
I’m somewhat saddened. Had 
congressional budget-cutters 
been less short-sighted two 
decades ago, the Higgs boson 
might have been discovered 
by a US-led team instead of by 
a European consortium. On 

4 July, no less. 

In 1993, Congress cancelled 
funding for the Superconducting 
Super Collider near Waxahachie, 
Texas, after sinking US$2 billion 
into an 87-kilometre particle 
accelerator that promised to 
establish the United States as 
the leader in physics research. 
Two years later, funding was 
approved by CERN to build the 
Large Hadron Collider near 
Geneva, Switzerland. 

US science is facing a growing 
threat from a well-funded 
anti-science movement, abetted 
by those corporations and 
politicians opposed to any 
research that conflicts with their 
own vested interests. 

Apathy towards basic research 
in the United States is coupled 
with an increasing reluctance 
to invest in science projects 
that do not have a foreseeable 
pay-off. But let’s not forget 
that the pioneers of quantum 
mechanics in the 1900s — Niels 
Bohr, Albert Einstein and Erwin 
Schrodinger — were unable 
to offer any practical ideas 
about commercial uses for the 
subatomic particles, quarks and 
leptons they were bringing to 
light at the time. However, if you 
are reading this on a computer, 
tablet or smart phone, you have 
quantum mechanics to thank. 

An estimated 30-35% of 
today’s US gross domestic 
product is based on inventions 
derived from quantum theory, 
from semiconductors in 
computer chips and lasers 


in compact-disc players to 
magnetic resonance imaging in 
hospitals and much more. 

If it doesn’t want to fall 
behind, the United States 
should be following the lead of 
other nations that are investing 
in science and technology to 
benefit their economies. 
William J. Richards Hall 
Institute of Public Policy, New 
Jersey, USA. 
wrichards@hallnj.org 


Improve access to 
sanitation in China 


We hope that last month's 
raising of drinking-water 
standards in China will help 
to speed up improvements in 
the country’s sanitation. As 
in India, sanitation remains 
inadequate for a rapidly 
developing country (Nature 
486, 185; 2012). 

In 2010, 477 million people in 
China (36% of the population) 
did not have access to improved 
sanitation such as a ventilated 
pit latrine or a flush toilet piped 
to a sewer system (WHO/ 
UNICEF Progress on Drinking 
Water and Sanitation, 2012). 
There are national disparities as 
well, with 74% of people having 
improved access to sanitation 
in urban areas in 2010, but only 
56% in rural areas. Provision of 
sanitation facilities for disabled 
people is sparse. 

China's growing population 
and urbanization make 
sewage treatment a particular 
challenge. Although about 
73% of urban sewage is treated 
(China Statistical Yearbook 
on Environment; 2010), more 
than 95% of waste water in 
rural areas drains untreated 
into rivers and lakes (X. Sun 
et al. Chinese Agr. Sci. Bull. 26, 
384-388; 2010). 

The country has now 
increased its surveillance of 
freshwater pollution so that the 
new drinking-water standards 
can be met. This should 
catalyse the government into 
investing more in nationwide 
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sanitation improvements. 
Hong Yang, Jim A. Wright 
University of Southampton, UK. 
hongyanghy@gmail.com 
Stephen W. Gundry University 
of Bristol, UK. 


Better lives, not just 
contraceptives 


Last month’s London Summit on 
Family Planning, hosted by the 
Bill & Melinda Gates Foundation 
and the UK government's 
Department for International 
Development, has been hailed 

as a resounding success. A total 
of US$2.6 billion was pledged to 
provide 120 million women and 
girls in developing countries with 
access to family-planning services 
by 2020. In measuring the success 
of this welcome campaign, the 
delivery of social change should 
also be taken into account. 

The hosts emphasize that 
results will be rapid and 
quantifiable, for example 
in terms of the number of 
contraceptives supplied. But 
reducing unwanted pregnancies 
requires other improvements 
in women’s lives, such as better 
education for girls and reduced 
child mortality (J. Dréze and 
M. Murthi Popul. Dev. Rev. 

27, 33-63; 2001), outreach by 
community-health workers and 
women’s empowerment (see, 
for example, go.nature.com/ 
bpjgma), and quality family- 
planning programmes. Such 
factors are harder to quantify. 

Focusing simply on what can 
be measured encourages short- 
term, narrow interventions 
rather than broader, longer- 
term strategies. For instance, 
value-for-money criteria make 
it tempting to sidestep national 
health-care systems, when 
supporting these is crucial to 
the delivery of appropriate 
technologies in developing 
countries. 

Devi Sridhar University of 
Oxford, UK. 
devi.sridhar@wolfson.ox.ac.uk 
Karen Grépin New York 
University, USA. 
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Bat deaths from 
wind turbine blades 


You suggest that wind turbines 
kill bats as a result of air-pressure 
changes when they fly through 
the wake of a spinning blade 
(the barotrauma hypothesis). 
However, this is likely to be 
only a minor cause of bat deaths 
(Nature 486, 310-311; 2012). 

The barotrauma hypothesis 
has been criticized as based 
on erroneous interpretations 
of bat injuries (K. E. Rollins 
etal. Vet. Pathol. 49, 362-371; 
2012). Evidence from bat 
carcasses shows that blunt- 
force trauma from the spinning 
blades is a much more common 
killing mechanism (see also 
S.M. Grodsky et al. J. Mammal. 
92, 917-925; 2011). 

We hope that this finding 
will be useful in mitigating the 
effects of wind turbines on bat 
mortality. 

Angelo Capparella, Sabine 
Loew Illinois State University, 
Normal, Illinois, USA. 
apcappar@ilstu.edu 

David K. Meyerholz University 
of Iowa, Iowa City, USA. 


Giants all around — 
apart from the squid 


A smile would have crossed the 
late Andrew Huxley’s face at your 
description of his “experiments 
on the axon of the giant squid” 
(Nature 486, 10-11; 2012). 
Huxley was giant, the axon was 
giant, but the squid were quite 
average in size. 

Jonathan C. Horton University 
of California, San Francisco, 
California, USA. 
hortonj@vision.ucsf.edu 
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COLUMN 
The roots of research misconduct 


Mentors should understand what causes misconduct among trainees — and keep in 
mind some possible remedies, argues William Neaves. 


hat is the most effective way for men- 
tors to prevent misconduct among 
trainees? First, they should make 


sure that those trainees understand the impor- 
tance of research integrity. Consistently model- 
ling good practice beats lecturing hands down, 
and discussing ethical guidelines at laboratory 
meetings helps the team to appreciate honesty 
—and the grim consequences of misconduct. 

But mentors should also understand the moti- 
vation behind some acts of misconduct, and the 
steps they can take to make sure that misguided 
trainees don't commit scientific fraud. 

While dean of a US biomedical institution 
more than a decade ago (before my time at the 


Stowers Institute), I dealt with three cases of sci- 
entific misconduct. Each led to an admission of 
misconduct, sanctions against the perpetrator 
by the US Office of Research Integrity (ORI) 
and public disclosure of the person's identity. 
One case also led to the retraction of several 
publications. In none of the cases was there any 
wrongdoing on the part of the mentors. 

In the first case, a postdoctoral fellow run- 
ning 90-minute experiments found that data 
points tended to plateau after the first 30 min- 
utes. Concluding that nothing of interest hap- 
pened in the final hour, the postdoc started 
fabricating those data points. By taking this 
shortcut, the postdoc quickly generated data, 
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which the mentor incorporated in a manu- 
script that was then submitted for publication. 
After belatedly examining the postdoc’s lab 
notebook, the mentor discovered the discrep- 
ancy between data collected and data included 
in figures, and withdrew the manuscript — but 
not before it had been accepted by a journal. 
When confronted, the postdoc confessed, and 
was fired by the host institution. The case took 
a twist when the postdoc formally accused the 
mentor of encouraging misconduct by pres- 
suring trainees to generate data. The host 
institution conducted an inquiry according 
to ORI standards and found no evidence that 
other trainees in the lab perceived unusual > 
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GENDER IMBALANCE 
Action plans for equality 


The 21 members of the League of 
European Research Universities 

(LERU) are set to implement strategies 
to eliminate gender bias in scientific 
research. Women, Research and 
Universities: Excellence Without Gender 
Bias, a LERU report published on 

10 July, outlines recommendations for 
universities, funders, policy-makers and 
publishers to improve gender balance 
across the European Union. Institutions 
have agreed to create and launch action 
plans to mitigate bias, and set up gender- 
equality offices. A LERU target group will 
meet this November and annually from 
next spring to monitor progress, says 
Kurt Deketelaere, secretary-general of 
LERU in Leuven, Belgium. The European 
Commission is using the report in its 
construction of the European Research 
Area (see ‘Research area first steps’). 


COLLABORATIONS 
US-European deal 


US early-career researchers have a new 
route for in-person collaborations with 
European colleagues. US National Science 
Foundation (NSE) postdoctoral fellows 
or recipients of Faculty Early Career 
Development awards will be able to 
spend 6-12 months in teams funded by 
European Research Council grants under 
an agreement announced on 13 July. 
David Stonner, director of the NSF office 
of international science and engineering 
in Arlington, Virginia, says that the 
partnerships could lead to positions for 
postdocs and help researchers to build 
and expand networks. “Getting young 
researchers into European research 
networks early in their career will pay 
dividends for years,” says Stonner. 


EUROPE 
Research area first steps 


The European Commission on 17 July 
signed agreements with the first five 
stakeholders of the European Research 
Area. The scheme aims to boost research in 
the European Union (EU) by: coordinating 
member states, funders and research 
organizations; making pensions mobile; 
improving gender equality (see ‘Action 
plans for equality’); and opening hiring 
practices across borders. EU research is 
currently fragmented, with little exchange 
of information and restricted mobility, 

said Maire Geoghegan- Quinn, European 
commissioner for research, in a statement. 
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> pressure to produce results. 

Fortunately for the scientific reputation of 
the mentor, he required lab members to main- 
tain bound notebooks that included details 
of all experiments and data. The case taught 
him always to scrutinize the relevant notebook 
entries before submitting a manuscript. 

In the second misconduct case, a mentor 
had asked a postdoc to purify a protein sam- 
ple so that only a single band remained in a 
western-blot assay. Instead, the postdoc used 
a physical mask so that only one band was 
recorded. A technician found the discarded 
mask and took it to the mentor, who con- 
fronted the postdoc; he admitted falsifying 
the results. The postdoc was fired by the host 
institution and sanctioned by the ORI. 

What if the technician had not discov- 
ered the mask? The mentor could still have 
taken steps to safeguard the integrity of the 
work and his own reputation. Having urged 
the postdoc to purify the sample until a blot 
showed only one band, he could have sought 
evidence of the purification steps in the post- 
doc’s lab notebook. And he could have asked 
a second member of the research team to 
verify that the results were reproducible. 

The final case wreaked havoc on a men- 
tor’s research programme. It started when a 
graduate student falsified a cell-killing assay 
and fabricated data to support the mentor’s 
favoured hypothesis. The fraud continued 
when the mentor retained the culprit as a 
postdoc, on the basis that no one else could 
“make the assay work properly”. Over the 
course of several years, the postdoc manu- 
factured data for multiple publications. 

After the postdoc left the lab to join a bio- 
technology company, the mentor assigned a 
new postdoc to perform the assay. When he 
could not get the expected results, the new 
postdoc personally paid the former postdoc 
to perform the assay for him. Sure enough, 
the results supported the mentor’s theory. But 
the former postdoc would not show the new 
postdoc how he performed the assay. 

The former postdoc then left his biotech- 
nology job, and the mentor rehired him, 
assigning him responsibility for the assay. 
But the rehired postdoc would only perform 
the assay late at night, after everyone had left. 
Frustrated, the new postdoc hid in the lab one 
night and saw the culprit pipette a radioactive 
label directly into scintillation vials, without 
any attempt to recover it from experimental 
samples of labelled cells that had been exposed 
to the (hypothetical) killing agent. The new 
postdoc reported his observations to the men- 
tor, who immediately informed the dean. 

The dean learned that the mentor had 
never given blinded specimens to the culprit. 
To avoid having to rely entirely on the new 
postdoc’s testimony, the university's chief aca- 
demic officer advised the mentor to set up a 
sting operation. The mentor prepared speci- 
mens labelled as experimental that contained 
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no radioactivity. When assayed by the rehired 
postdoc, these specimens yielded radio- 
activity that only he could have added to the 
scintillation vials. When first confronted, he 
denied everything. The ORI reviewed results 
of the investigation and concluded that the 
rehired postdoc had engaged in misconduct. 
Only then did he acknowledge his guilt. The 
mentor and co-authors from multiple institu- 
tions retracted four high-profile publications 

that had been based on the fabricated data. 
On reflection, what could a mentor have 
done to prevent this debacle? Simply keep- 
ing experimental and control specimens 
blinded during analysis would have suf- 
ficed. Moreover, I believe that a mentor in 
such circumstances should hear alarm bells 
if only one per- 


“Mentors son ina lab can get 
should not avoid the assay to work. 
a discussion Whenever results 
onresearch depend on human 
integrity just manipulation or 
because of measurement, team 
their own members should 


verify each other’s 
work. When a new 
person joins the lab, the mentor can make it 
clear that verification practices do not reflect 
mistrust. For consistency, co-workers should 
also repeat the mentor’s measurements. 

Do such practices prevent fraud? They cer- 
tainly make it more difficult. Just as impor- 
tantly, they protect against inadvertent error 
and subconscious bias. Many of us wish for 
data that support our theories, and trainees 
may anticipate outcomes that would please the 
mentor. In general, evidence would suggest 
that very few trainees curry favour by fabricat- 
ing data, but mentors should be careful not to 
encourage misconduct by signalling their dis- 
appointment when a trainee’s data confound 
expectations. The chances of falsification or 
fabrication of results are greatly reduced when 
alab uses only blinded specimens and when 
other lab members are always responsible for 
independently verifying reproducibility. 

In my experience, mentors often avoid 
discussing scientific misconduct with lab 
members, perhaps out of a misguided con- 
cern that doing so might imply mistrust. 
There are ways, however, to circumvent this. 
For example, mentors could broach the topic 
by first discussing the increasing incidence 
of retractions (up tenfold in the past decade; 
see Nature 478, 26-28; 2011). In that way, 
they can engage trainees without calling into 
question anyone’ integrity. 

Mentors should not avoid a discussion on 
research integrity just because of their own 
discomfort. The potential consequences for 
careers and reputations are too severe. m 


discomfort.” 


William Neaves is the president emeritus of 
the Stowers Institute for Medical Research in 
Kansas City, Missouri. 
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Models of grid cells and theta oscillations 


ARISING FROM M. M.Yartsev, M. P. Witter & N. Ulanovsky Nature 479, 103-107 (2011) 


Grid cells recorded in the medial entorhinal cortex (MEC) of freely 
moving rodents show a markedly regular spatial firing pattern whose 
underlying mechanism has been the subject of intense interest. 
Yartsev et al.’ report that the firing of grid cells in crawling bats does 
not show theta rhythmicity “causally disproving a major class of 
computational models” of grid cell firing that rely on oscillatory 
interference*’”. However, their data may be consistent with these 
models, with the apparent lack of theta rhythmicity reflecting slow 
movement speeds and low firing rates. Thus, the conclusion of 
Yartsev et al. is not supported by their data. 

In oscillatory interference models, path integration is performed by 
velocity-dependent variation in the frequencies of theta-band oscilla- 
tions, which combine to generate the grid-cell pattern” **’. In addi- 
tion, learned associations to environmental sensory inputs (possibly 
mediated by place cells) ensure that grids are spatially stable over time 
and are sufficient to maintain firing in familiar environments***. In 
rats, the majority of grid cells show theta-modulated firing’”°, and the 
model predicts specific relationships between modulation frequency, 
running velocity and grid scale*, which have been verified in grid cells" 
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and in putative velocity-controlled oscillatory inputs identified as inter- 
neurons within the septohippocampal circuit’. 

Yartsev et al.' recorded the firing of grid cells from bats trained to 
crawl within the recording environment, a behaviour that they per- 
form very slowly (a mean speed of 3.7cms ' versus 17.6cms  * in 
our rat data), often stopping entirely (supplementary figure 11 in 
ref. 1). The authors found grid cells with very low firing rates (a mean 
peak rate of 0.56 Hz versus 5.14 Hz in our data) and little significant 
theta modulation. However, matching movement speed is important 
for comparisons involving theta. At low speeds movement-related 
theta rhythmicity is strongly attenuated’? and the need for path 
integration is reduced. Equally importantly, low firing rates impede 
detection of theta rhythmicity (5-10 Hz), which requires periods con- 
taining plenty of spikes fired within tens to hundreds of milliseconds 
of each other (something that is absent in bat interspike interval 
histograms; supplementary figure 2b in ref. 1). 

We examined whether differences in movement speeds and firing 
rates between the rat data and the bat data could explain the apparent 
lack of theta rhythmicity in bat grid cells. We took random samples of 
25 cells from a representative data set of 85 grid cells recorded in rat 
MEC (Fig. la, bottom row), extracted periods of slow running to 
match bat movement speeds, and duplicated this data until it exceeded 
the duration of the longest bat trial (60min). We then randomly 
discarded spikes to match the mean firing rates of each of the 25 
published bat grid cells. From the 25 down-sampled rat cells matching 
each bat grid cell, we selected the one with the median theta index as 
representative. This process was repeated 10 times. Subjecting the 10 
sets of 25 down-sampled cells to the analyses of Yartsev et al. produced 
a relative absence of theta rhythmicity (Fig. 1b, fourth row). So, if rats 


Figure 1 | Down-sampled rat grid cells and oscillatory interference 
reproduce bat grid-cell firing. a, b, The firing of grid cells in rats (a) resembles 
grid-cell firing in bats’ if the rat data are down-sampled to match the low firing 
rates and slow movements of the bat data (b). c, d, The oscillatory interference 
model simulates theta-modulated grid cell firing in rats (c), and also apparently 
un-modulated grid-cell firing in bats when firing rates are reduced (d). 

a-d, Top row, example firing-rate maps (peak rate and gridness, above). Second 
row, example spike-train autocorrelograms. Third row, distributions of 
gridness scores. Fourth row, distributions of theta modulation (theta index). 
Grid cells have gridness > 0.33 (red line). “Theta-modulated cells’ have a theta 
index of = 5 (red line). The theta index exceeded the 95th percentile for that 
cell’s temporally shuffled spike times for 58% of rat cells (a) but only for 2% of 
cells down-sampled to match the bat data (b; averaged over 10 samples of 25 
cells). This rises to 14% if speed is not down-sampled, 20% if only the 25 most 
strongly theta~-modulated rat cells are used and 72% for the 25 most strongly 
theta-modulated cells, if speed is unmatched. However, we do not consider this 
last cell population to be comparable to the bat grid cells because of the pre- 
selection of only the most strongly theta-modulated cells and the difference in 
movement speed between running rats and crawling bats. Theta index, gridness 
and shuffling follow ref. 1 (in which theta index is theta power divided by mean 
power 0-50 Hz), except for a, bottom row, which shows theta index calculated 
following ref. 13 (that is, theta power divided by mean power 0-125 Hz), giving 
higher values that match the proportions of theta-modulated cells in ref. 13 
(which range from 62% in layer V, where most bat cells were recorded, to 90% 
in layer III). e, Schematic showing how theta-modulated inhibitory spike trains 
(top, black ticks) drive the grid cell’s membrane potential (middle, black trace), 
producing spikes when exceeding a threshold (middle, red dashed line). Spatial 
firing fields (bottom) are defined by constructive interference (top, grey lines 
show theta modulation; middle, grey line shows the resulting interference 
pattern), but the underlying oscillations are undetectable at low firing rates (see 
Methods for details). 
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moved as slowly as bats and their grid cells fired as infrequently, rat 
grid cells would show bat levels of gridness (below the higher levels 
seen in rats), and theta modulation would be very hard to detect. 

Most importantly, to disprove the model requires knowing how 
much theta rhythmicity it predicts in low-firing-rate cells. 
Simulations (using code adapted from ref. 7) with strong theta modu- 
lation and typical firing rates for rats (Fig. 1c) also lack significant 
theta modulation when firing rates are reduced to bat levels (Fig. 1d, 
fourth row). Although spatially modulated firing is driven by inter- 
ference between theta-modulated inputs, the theta rhythmicity is 
undetectable in low-rate spike trains (Fig. le). 

Local field potentials and multi-unit activity were also reported in 
bats’, but these reflect the physical arrangement and coherence of 
populations of cells, which may vary between species and are not 
addressed by the model (although spatially offset grids require 
phase-offset oscillators’, suggesting no overall phase preference in 
the model). Finally, consistent with the model, grids might be set 
up through oscillatory interference during the initial training of the 
bats to not fly out of the box (by physically blocking from above), and 
maintained (at lower firing rates) by learned sensory associations 
during subsequent slow crawling in the now highly familiar box. 


Methods 


The activity of 85 grid cells was recorded from superficial and deep layers of rat 
MEC during 20min foraging in 1-m* arenas using standard procedures*. 
Random samples of 25 cells were speed matched by removing periods of fast 
running, retaining periods of =0.5s, until the median speed was 3.7 cm sl. 
Speed-matched data were duplicated and concatenated to exceed the duration 
of the longest bat trial (60 min). Cell firing rates were down-sampled by randomly 
removing spikes, in turn, to match the mean firing rate of each of the 25 bat 
grid cells (mean rate taken as 25% of the peak rates found by Yartsev et al. 
(range of mean rates, 0.03-0.40 Hz)). Spike-train autocorrelograms combined 
the individual autocorrelograms from each period of slow running’’ and were 
mean-normalized to avoid low-frequency power reducing the theta index (com- 
pare with figure 4g in ref. 1). Grid cells were simulated as leaky integrate-and-fire 
neurons (time constant 20 ms) receiving three oscillatory inhibitory spike trains’ 
(Poisson processes with rate 50 + 30cos(2mft), in which frequency (f) varies 
around 8 Hz according to running velocity, with a peak inhibitory synaptic con- 
ductance’* of 14pS) and a noisy persistent excitatory current sampled from 
N(m,2m), in which m = 336nA for low firing rates and m = 436nA for high 
rates (mean peak rates are 0.48 Hz and 5.11 Hz, respectively). 
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REPLYING TO C. Barry, D. Bush, J. O’Keefe & N. Burgess Nature 488, http://dx.doi.org/nature1 1276 (2012) 


Barry et al.' propose that it is impossible to detect theta rhythmicity in 
bat grid cells because of their slow movement velocities and low firing 
rates; hence, they posit that our findings’ do not refute the oscillatory 
interference models of mammalian grid cells. To support this 
claim, they use a data set of rat grid cells of which only 58% were 
theta modulated, and constrained their analysis to periods of near 
immobility in the rat, a behavioural state in which theta is known 
to be absent’. Despite these biases, we argue that their own analysis 
showed that down-sampled rat cells were substantially more theta- 
modulated than real grid cells from bats, and we demonstrate 
further that the bat data have adequate statistical power to detect theta 
rhythmicity—if it was present in bat grid cells. Finally, Barry et al. 
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focused solely on ‘first generation’ oscillatory interference models, 
ignoring our disproval of ‘second generation’ models. We thus uphold 
our original results and interpretation’. 

Barry et al. analysed a data set of 85 rat grid cells, of which only 
58% were significantly theta-modulated to begin with (although 
oscillatory interference models require 100% of cells to be theta- 
modulated). The strength of theta modulation in their data was lower 
than in the much larger data set publically available from the Moser 
laboratory (a median theta index of 10.9 in Barry et al. compared to 
14.23 in data from the Moser laboratory*), which may lower the 
detectability of theta rhythmicity after the data of Barry et al is 
down-sampled. 


©2012 Macmillan Publishers Limited. All rights reserved 


BRIEF COMMUNICATIONS ARISING 


Barry et al. proposed that bats’ slow crawling velocity reduces theta 
rhythmicity in bat grid cells. Consequently, they selectively removed 
portions of the rat data, retaining short periods, down to 500 ms in 
duration, until a median speed of 3.7 cms’ ‘ was achieved. This pro- 
cedure has several flaws. 

First, although these velocities correspond to an active movement 
state of bats, they are equivalent to nearly complete immobility in 
rats”®. Barry et al. thus compared a state in rats in which theta is not 
expected? (near immobility), to a state in bats at which theta, ifexisting, 
should be most prominent? (active movement). Furthermore, in their 
model, for the constant B (which determines the velocity modulation of 
dendritic inputs) Barry et al. used a value derived exclusively from rat 
data’. However, different mammalian species would probably have 
different velocity dependences in dendritic inputs; for example, when 
modelling grid cells in cheetah versus sloth, it would not make sense to 
use a f value taken from rat, and the same goes for the modelling of grid 
cells in bats. Thus, we contend that movement speed should not be 
matched, neither when simulating a model nor when down-sampling 
data from one species to mimic another. 

Second, Barry et al. used short portions of near-immobility data, as 
short as 500 ms in duration, creating very intermittent, unrealistic 
spike trains; this tapers down the oscillatory cycles (because they 
estimated 1000-ms autocorrelations using 500-ms data epochs) and 
could induce an unwarranted statistical bias downwards in detect- 
ability of theta rhythmicity. 

Third, Barry et al. found that when firing rates were matched to 
those of bats while movement speed was left untouched, 24% of their 
theta-modulated grid cells retained significant theta rhythmicity 
(14% of 58% = 24%); substantially higher than the 4% (1/25) theta- 
modulated grid cells in bats’. Notably, when Barry et al. analysed the 
top 51% of their theta-modulated rat grid cells (51% of the 58% 
modulated cells = 25 neurons)—which are the most relevant cells 
to consider for their model (especially given the weak theta rhythmicity 
in their data set)—they found that when firing rates are matched to 
those of bats and velocities are left untouched, the large majority (72%) 
of rat grid cells retained significant theta rhythmicity. Thus, down- 
sampled rat grid cells were markedly more oscillatory than our bat 
grid cells, supporting our original analysis and interpretation’. 

Last, Barry et al. considered only single-cell, first-generation 
oscillatory interference models of grid cells’, which have been 
criticized as theoretically problematic’. Some of these problems have 
been rectified in recent second-generation versions of these models’, 
which used networks of coupled oscillators and explicitly predicted 


network-level theta oscillations”’®. This was contradicted by our bat 
data, in which brief theta oscillations occurred very rarely in the local- 
field potential’, and multi-unit firing (reflecting network activity) 
never showed any theta oscillations’. 

In conclusion, we feel that the analysis by Barry et al. fails to support 
their main argument, namely that the statistical power of the bat data 
does not allow detecting theta rhythmicity. To resolve this debate, we 
propose to make use of a large, unbiased, publically available data set 
of rat grid cells, such as that on the Moser laboratory website, which 
would allow transparency in analysis techniques and in the baseline 
rat data being used. Furthermore, we expect that neural recordings 
from single units in flying bats, in which movement velocities and 
neuronal firing rates are expected to be much higher, will provide 
another key approach. 


Michael M. Yartsev?, Menno P. Witter? & Nachum Ulanovsky2 
1Department of Neurobiology, Weizmann Institute of Science, Rehovot 
76100, Israel. 

email: nachum.ulanovsky@weizmann.ac.il 

?Kavli Institute for Systems Neuroscience and Centre for the Biology of 
Memory, Norwegian University of Science and Technology, NO-7489 
Trondheim, Norway. 


1. Barry, C., Bush, D., O’Keefe, J. & Burgess, N. Models of grid cells and theta 
oscillations. Nature 488, http://dx.doi.org/nature1 1276 (2012). 

2. Yartsev, M. M., Witter, M. P. & Ulanovsky, N. Grid cells without theta oscillations in 
the entorhinal cortex of bats. Nature 479, 103-107 (2011). 

3. Buzsaki, G. Theta oscillations in the hippocampus. Neuron 33, 325-340 (2002). 

4. Sargolini, F. et al. Conjunctive representation of position, direction, and velocity in 
entorhinal cortex. Science 312, 758-762 (2006). 

5. Jeewaijee, A., Barry, C., O’Keefe, J. & Burgess, N. Grid cells and theta as oscillatory 
interference: electrophysiological data from freely moving rats. Hippocampus 18, 
1175-1185 (2008). 

6. Wills, T. J., Cacucci, F., Burgess, N. & O’Keefe, J. Development of the hippocampal 
cognitive map in preweanling rats. Science 328, 1573-1576 (2010). 

7. Burgess, N., Barry, C. & O’Keefe, J. An oscillatory interference model of grid cell 
firing. Hippocampus 17, 801-812 (2007). 

8. Giocomo, L. M., Zilli, E. A., Fransén, E. & Hasselmo, M. E. Temporal frequency of 
subthreshold oscillations scales with entorhinal grid cell field spacing. Science 
315, 1719-1722 (2007). 

9. Giocomo, L. M., Moser, M.-B. & Moser, E. |. Computational models of grid cells. 
Neuron 71, 589-603 (2011). 

10. Zilli, E. A. & Hasselmo, M. E. Coupled noisy spiking neurons as velocity-controlled 
oscillators in a model of grid cell spatial firing. J. Neurosci. 30, 13850-13860 
(2010). 


Author Contributions All authors contributed to writing this reply. 


doi:10.1038/nature11277 


2 AUGUST 2012 | VOL 488 | NATURE | E3 


©2012 Macmillan Publishers Limited. All rights reserved 


NEWS & VIEWS 


PALAEONTOLOGY 


An insect to fill the gap 


A complete insect fossil from the Devonian period has long been sought. The finding of a candidate may improve 
our patchy understanding of when winged insects evolved. SEE LETTER P.82 


WILLIAM A. SHEAR 


the most successful group of animals 

ever to have lived. But their evolution- 
ary origins are a source of controversy, 
and will continue to be so until the fossil 
record finally yields up unequivocal evi- 
dence of insect beginnings. On page 82 of 
this issue, Garrouste et al.' claim to have 
found precisely this. Although it can 
hardly be described as well preserved, 
the fossil shows a six-legged thorax, long 
single-branched antennae, triangular 
jaws and a 10-segmented abdomen (see 
Fig. 2 of the paper’). Insects are the only 
known arthropods (joint-legged inver- 
tebrate animals) with this anatomical 
combination, allowing the authors to 
make a strong case for the fossil’s insectan 
identity. 

The 8-millimetre-long fossil, which 
the authors named Strudiella devonica, 
was found ina small rock slab excavated 
at a quarry in Belgium. Strudiella is 
dated to approximately 370 million years 
old, which places it late in the Devo- 
nian period (Fig. 1). This was the time 
when terrestrial ecosystems were first 
assembling from their aquatic progeni- 
tors’ — the first forests were established 
and the earliest four-legged vertebrates 
were crawling out from freshwater pools 
onto land. So far, only suggestive traces 
of insects have been found in rocks of 
this age. The famous Rhynie chert, a 
sedimentary deposit in Scotland that 
is about 402 million years old, contains 
fossils of collembolans’, a class of animal that 
contains today’s ubiquitous springtails, and 
which is regarded as closely related to insects. 
The Rhynie chert has also yielded a pair of 
jaw fossils called Rhyniognatha, which may 
be from an advanced, winged insect*. In New 
York state, some fossilized scraps of character- 
istic cuticle and the framework ofa single com- 
pound eye, perhaps from a primitive, wingless 
insect, have been found in 385-million-year 
old rocks’. But these fragments more or less 
complete the picture of all that is known of 
insects at this crucial time in Earth’s history. 

There have also been some false alarms. 


| nsects are, in terms of. species number, 


Time (millions of years ago) 
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Figure 1 | Winged beginnings. The fossil record provides ample 
examples of winged insects from around 325 million years ago, 
such as the order Palaeodictyoptera from the Carboniferous 
period. However, there is little evidence of insect evolution before 
this time; only a handful of fossils, including 402-million-year-old 
Rhyniella praecursor, which appears similar to extant collembolan 
arthropods, have been found. But none of these few examples 
comes from the period between 385 million and 325 million years 
ago, which is referred to as the Hexapoda gap (striped region) 
because of the lack of insect evidence. Now, however, Garrouste 

et al.’ report the finding of Strudiella devonica, a fossil dated to 
370 million years ago that shows multiple anatomical features that 
are characteristic of insects. 


For example, the fossilized head of a wingless 
insect found®” in Canadian strata somewhat 
older than the New York deposits is almost cer- 
tainly a contaminant — a much more recent 
or contemporary insect lodged in a crack in 
the rocks. And Leverhulmia mariae, also from 
Scottish chert near Rhynie, could have been an 
insect, a close relative, or neither — it seems 
to have too many legs to be easily classified’. 
So although its age makes it too late to be 
an insect ancestor, or even the earliest insect, 
Strudiella is nonetheless of great potential 
significance as the oldest complete insect 
fossil yet found. This is the first and primary 
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point speaking to its importance. 

We can perceive only what the fossil 
record permits us to perceive. From our 
current viewpoint, the diversification of 
insects and of our own terrestrial verte- 
brate ancestors seems to have occurred 
in two evolutionary bursts’. Between 
425 million and 385 million years 
ago, both groups probably originated 
and underwent an initial evolution- 
ary radiation as they began occupying 
the newly available subaerial realm. 
There then follows a long period, called 
Romer’s gap (360 million to 345 mil- 
lion years ago) for the vertebrates and 
the longer Hexapoda gap for insects 
(385 million to 325 million years ago), 
during which few, if any, fossils of these 
groups can be found (Fig. 1). Then, 
with apparent suddenness, an explosive 
appearance of many new forms takes 
place in the second round of diversi- 
fication. For the insects, large winged 
species of the major groups (mayflies, 
proto-dragonflies and others, includ- 
ing extinct types) show up, seemingly 
without precursors. The insects were 
off and running on their way to world 
domination. 

These gaps, and the two bouts of evo- 
lution that they create, may or may not 
be real. There is evidence that a period 
of low atmospheric oxygen concentra- 
tion coincided with the gap period, and 
this could have suppressed the rate of 
appearance of novel anatomy”. Buta 
more parsimonious explanation is sim- 
ply that we have not yet found the right 
rock formations to reveal fossils that would fill 
in the gaps. For example, most of the exposed 
strata for this period in Europe and North 
America are of marine, not land, origin. 

This brings us to the second reason for 
the importance of Strudiella — it is dated to 
a time smack in the middle of the Hexapoda 
gap (Fig. 1). According to Garrouste et al., 
this significantly narrows the gap. And if, as 
the authors suggest, the fossil came from the 
young stage of an animal that would have had 
wings as an adult, their finding would mean 
that winged insects originated much earlier 
than fossils have heretofore told us, and that 


the sudden appearance of many winged kinds 
around 325 million years ago is deceptive. It 
would also suggest that the Rhyniognatha 
fossils could indeed be the mandibles of 
a winged insect, and that the diversification 
of winged species could have taken place 
at a much more leisurely pace, over some 
45 million years. 

Considering the crucial role of insects in 
present-day ecology, the number of people 
engaged in studying their fossil history is dis- 
mally small. Furthermore, current specialists 
focus mostly on events from the Mesozoic 
period — the ‘Age of Dinosaurs’ — which 
began some 70 million years after the time of 
the first known winged insect fossils, or on 
even more recent amber-preserved insects, 


which are largely indistinguishable from living 
forms". The beginnings of the insects are to 
be found in rocks much older even than those 
that enclosed Strudiella, but almost no one is 
looking for them. The paltry few insect fossils 
contemporary with Strudiella — and indeed 
Strudiella itself — were serendipitous, not 
deliberate, finds, and the Hexapoda gap still 
looms large. m 
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The balance of the 
carbon budget 


Careful analysis reveals that the global uptake of anthropogenic carbon dioxide 
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INGEBORG LEVIN 
recisely quantifying the fate of man-made 
Pp carbon dioxide is vital for reliably esti- 
mating future atmospheric CO, levels 
and the contribution of this greenhouse gas 
to global climatic change. There has been 
much debate about whether currently active 
carbon sinks are likely to change drastically in 
the near future’, or might already 
have weakened during the past 
few decades”*. On page 70 of this 
issue, Ballantyne and colleagues* 
clarify matters. By calculating the 
worldwide increase in the atmos- 
pheric CO, burden during the past 
50 years from precise global obser- 
vations, and by carefully account- 
ing for anthropogenic carbon 
sources and their uncertainties, 
they find that total global carbon 
sinks have not declined during this 
period. Rather, they have increased 
more than twofold — although fos- 
sil-fuel CO, emissions have risen 
almost fourfold** over this time. 
Roughly half of the anthropo- 
genic CO, emissions caused by 
the burning of fossil fuels and by 
land-use change (such as deforest- 
ation) are currently taken up by the 
oceans or re-enter the terrestrial 
biosphere — for example, through 
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afforestation or increased biomass produc- 
tion by plants. These carbon sinks have been 
efficiently damping increases in anthropo- 
genic CO, (Fig. 1), but will not necessarily 
continue to do so. Possible mechanisms that 
could result in reduced carbon uptake include 
changes in carbon chemistry and mixing in 
the oceans’, and potential feedback from land 
ecosystems such as increased decomposition 


500 
450 
CO, 
‘CO accumulat 
Aes Total anthropogenic CO, mails 
350 
Atmospheric CO, 
300 
1960 1970 1980 1990 2000 201 
Year 


Figure 1 | Partial absorption of anthropogenic carbon dioxide. The 
graph compares atmospheric levels of carbon dioxide since the 1960s 
(measured at Mauna Loa, Hawaii’') with the levels that would have occurred 
asa result of the accumulation of anthropogenic CO, emissions in the 
absence of carbon sinks (as calculated by Ballantyne et al.*). The amount of 
CO, that has accumulated in sinks is represented by the difference between 
the two curves (green arrow). 
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of soil organic matter caused by rising global 
temperatures*. 

To assess variations in carbon sinks over the 
past 50 years, Ballantyne et al. used a strikingly 
simple approach: they calculated the annual 
changes in the global atmospheric CO, inven- 
tory from a worldwide network of long-term 
observations. By subtracting the annual total 
amount of anthropogenic CO, emissions from 
these changes, they quantified the net CO, 
uptake by the land and oceans each year. This 
fundamental strategy reminds us that from 
time to time we should step back and carefully 
consider the basic inputs and outputs of the 
carbon cycle ata global level. Although similar 
work has been done before”, Ballantyne et al. 
go one step further by quantifying an increase 
in global CO, uptake as worldwide emissions 
are increasing. 

To determine the uncertainties of their esti- 
mated carbon budget, the authors combined 
the most comprehensive, high-accuracy 
measurements of atmospheric CO, available 
with three inventories of estimated 
global fossil-fuel CO, emissions, 
and with three inventories of 
emissions estimated to have been 
caused by land-use change. The 
inventories had been made inde- 
pendently of each other, and were 
obtained using different methods. 
Bringing together all these data 
allowed the authors to reliably cal- 
culate the errors of their different 
budget terms. Their finding that 
the net carbon sink is increasing is 
therefore robust, and proves that 
ocean and land sinks have not both 
decreased in the last half century. 
How the total sink was divided 
between oceans and land in the 
past, and whether the oceans’ CO,- 
uptake rate has decreased, as sug- 
gested by some modelling studies’, 
remain open questions. 

Why is it so important to know 
where half of the carbon that we are 


ion 


(0) 


2 AUGUST 2012 | VOL 488 | NATURE | 35 


the sudden appearance of many winged kinds 
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a winged insect, and that the diversification 
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which are largely indistinguishable from living 
forms". The beginnings of the insects are to 
be found in rocks much older even than those 
that enclosed Strudiella, but almost no one is 
looking for them. The paltry few insect fossils 
contemporary with Strudiella — and indeed 
Strudiella itself — were serendipitous, not 
deliberate, finds, and the Hexapoda gap still 
looms large. m 
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graph compares atmospheric levels of carbon dioxide since the 1960s 
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currently emitting into the atmosphere actu- 
ally goes? One reason involves the variable 
sustainability of carbon sinks. It makes a big 
difference whether the extra carbon emitted 
is stored in reservoirs such as the deep oceans, 
where it could stay for hundreds or thousands 
of years, or whether it is taken up by the growth 
of new forests, where it would stay for only a 
few years or decades before being returned 
to the atmosphere (either as respired CO, or 
when wood from the forests is used to produce 
energy). Another equally important reason is 
the need to understand the processes responsi- 
ble for carbon uptake, because this knowledge 
will allow reliable predictions to be made of 
future atmospheric CO, abundance. 

This brings us back to the big question posed 
by Ballantyne and colleagues’ work: if CO, 
uptake has increased, where is all the carbon 
going? Have we overlooked any major carbon- 
sink processes, and, if so, do we have the right 
observational strategies in place to detect and 
quantify these sinks? 

We certainly do not have enough precise, 
consistent, long-term measurements of carbon 
fluxes and stocks on land, in the oceans or in 
the atmosphere at all relevant scales — the data 
that form the backbone of our current under- 
standing of the carbon cycle. For example, 


DRUG DISCOVERY 


the monitoring of changes in carbon uptake 
by the Southern Ocean requires more precise 
and highly comparable measurements to be 
made in that region. The same is true for land 
areas, such as permafrost regions, that could 
undergo large changes as a result of changing 
climatic conditions. Satellite observations may 
help to provide data for regions for which we 
have no in situ measurements, but at present 
they are not able to deliver the calibrated, long- 
term and consistent observations required to 
quantify global carbon-budget changes. We 
must therefore set up and maintain a compre- 
hensive framework to monitor all carbon com- 
partments in situ. By combining data about 
bottom-up ocean and land fluxes with those 
from an atmospheric observational network, 
and integrating the information at regional, 
continental and global scales, we should be 
able to obtain urgently needed answers about 
carbon sinks. 

Besides better global coverage of observa- 
tions, more-accurate numbers on CO, emis- 
sions from energy production and land-use 
change are required. Although such data 
seem reliable for developed countries, those 
from emerging economies — the regions that 
are taking over the lead in energy consump- 
tion and CO, emissions — make estimates 


Kill the messenger 
where it lives 


A mutant repeating DNA sequence produces a toxic RNA molecule that causes the 
neuromuscular disorder myotonic dystrophy type 1. An ‘antisense’ therapy that 
targets this RNA in cell nuclei shows promise in mice. SEE LETTER P.111 


PETER K. TODD & HENRY L. PAULSON 


yotonic dystrophy type 1 is a devas- 
Mi inherited disorder for which 

effective treatments are lacking. It 
is characterized by myotonia (delayed relaxa- 
tion of muscles after contraction), weakness, 
cardiac arrhythmias, diabetes and cognitive 
changes. Patients carry a mutant DMPK gene 
that contains a greatly expanded DNA repeat 
sequence within a non-coding region’. When 
the mutant gene is expressed, it yields a mes- 
senger RNA molecule that seems to be toxic 
to cells such as muscle fibres* — which means 
that the inheritance of a single mutant copy of 
DMPK from either parent is sufficient to cause 
disease. On page 111 of this issue, Wheeler 
and colleagues’ describe the successful use of 
antisense oligonucleotides (synthetic RNA- 
like fragments that bind to a target RNA) to 
correct the molecular and physiological 
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features of the disease in a mouse model. 

The mutation in DMPK elicits the toxicity 
associated with myotonic dystrophy type 1 
(DM1) through at least two mechanisms’. Usu- 
ally, DMPK-encoded mRNA is synthesized in 
the cell’s nucleus and efficiently exported to 
the cytoplasm, where it is translated into pro- 
tein (Fig. 1a). However, owing to its expanded 
repeat, the mutant RNA forms a hairpin- 
shaped structure that binds to members of the 
MBNL family of proteins, which regulate RNA 
splicing, a process by which immature mRNAs 
are cut up and reassembled before translation 
(Fig. 1b). As a result, the mutant RNA and the 
proteins form aggregates (‘foci’) in the nucleus, 
and MBNL activity is decreased. In a second 
mode of action, the mutation somehow triggers 
increased levels of a different splicing protein, 
CELF1. The opposing effects of the DMPK 
mutation on CELF1 and MBNL activities 
disrupt RNA splicing and so lead to disease’. 
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of CO, inventories increasingly uncertain?”. 
Improvements in, and full transparency of, 
carbon accounting would help not only in 
negotiations aimed at reducing CO, emis- 
sions, but also in precisely budgeting natural 
CO, fluxes. This is true both on the global scale 
studied by Ballantyne et al. and for regional 
fluxes where carbon sinks are active. = 
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A potential strategy for treating DM1 is to 
‘silence’ the mutant DMPK using antisense 
oligonucleotides (Fig. 1c). Because patients 
carry a functional copy of the gene in addi- 
tion to the altered one, oligonucleotides could 
be designed that would bind selectively to 
the mutant RNA and mark it for degrada- 
tion — while leaving translation of the func- 
tional RNA unaffected. This strategy can be 
tested in a mouse model of DM1, in which the 
expanded repeat from a mutant DMPK has 
been added to an unrelated gene, the expres- 
sion of which is easy to track in muscle’. 
Indeed, it has been shown that direct injection 
of a specific type of antisense oligonucleo- 
tide (morpholino oligonucleotides) into the 
muscles of such mice reduced the toxicity of 
the mutant RNA, although systemic delivery 
proved inefficient’. 

Wheeler et al. used ‘gapmer’ antisense 
oligonucleotides, which contain chemical modi- 
fications at their ends that make them more 
stable. Moreover, gapmer oligonucleotides 
include a central sequence that, when bound 
to its target RNA, promotes target cleavage by 
the enzyme RNase H. The authors adminis- 
tered gapmer oligonucleotides to DM1 model 
mice by subcutaneous injection, and observed 
a robust and sustained decrease in the concen- 
tration of the toxic RNA in muscle, even when 
applying relatively low doses. This decrease 
was associated with improvement in a wide 
range of disease features — for example, the 
authors observed loss of nuclear foci, release 
of MBNL proteins from nuclear aggregates, 


| RESEARCH | NEWS & VIEWS 


currently emitting into the atmosphere actu- 
ally goes? One reason involves the variable 
sustainability of carbon sinks. It makes a big 
difference whether the extra carbon emitted 
is stored in reservoirs such as the deep oceans, 
where it could stay for hundreds or thousands 
of years, or whether it is taken up by the growth 
of new forests, where it would stay for only a 
few years or decades before being returned 
to the atmosphere (either as respired CO, or 
when wood from the forests is used to produce 
energy). Another equally important reason is 
the need to understand the processes responsi- 
ble for carbon uptake, because this knowledge 
will allow reliable predictions to be made of 
future atmospheric CO, abundance. 

This brings us back to the big question posed 
by Ballantyne and colleagues’ work: if CO, 
uptake has increased, where is all the carbon 
going? Have we overlooked any major carbon- 
sink processes, and, if so, do we have the right 
observational strategies in place to detect and 
quantify these sinks? 

We certainly do not have enough precise, 
consistent, long-term measurements of carbon 
fluxes and stocks on land, in the oceans or in 
the atmosphere at all relevant scales — the data 
that form the backbone of our current under- 
standing of the carbon cycle. For example, 


DRUG DISCOVERY 


the monitoring of changes in carbon uptake 
by the Southern Ocean requires more precise 
and highly comparable measurements to be 
made in that region. The same is true for land 
areas, such as permafrost regions, that could 
undergo large changes as a result of changing 
climatic conditions. Satellite observations may 
help to provide data for regions for which we 
have no in situ measurements, but at present 
they are not able to deliver the calibrated, long- 
term and consistent observations required to 
quantify global carbon-budget changes. We 
must therefore set up and maintain a compre- 
hensive framework to monitor all carbon com- 
partments in situ. By combining data about 
bottom-up ocean and land fluxes with those 
from an atmospheric observational network, 
and integrating the information at regional, 
continental and global scales, we should be 
able to obtain urgently needed answers about 
carbon sinks. 

Besides better global coverage of observa- 
tions, more-accurate numbers on CO, emis- 
sions from energy production and land-use 
change are required. Although such data 
seem reliable for developed countries, those 
from emerging economies — the regions that 
are taking over the lead in energy consump- 
tion and CO, emissions — make estimates 


Kill the messenger 
where it lives 


A mutant repeating DNA sequence produces a toxic RNA molecule that causes the 
neuromuscular disorder myotonic dystrophy type 1. An ‘antisense’ therapy that 
targets this RNA in cell nuclei shows promise in mice. SEE LETTER P.111 


PETER K. TODD & HENRY L. PAULSON 


yotonic dystrophy type 1 is a devas- 
Mi inherited disorder for which 

effective treatments are lacking. It 
is characterized by myotonia (delayed relaxa- 
tion of muscles after contraction), weakness, 
cardiac arrhythmias, diabetes and cognitive 
changes. Patients carry a mutant DMPK gene 
that contains a greatly expanded DNA repeat 
sequence within a non-coding region’. When 
the mutant gene is expressed, it yields a mes- 
senger RNA molecule that seems to be toxic 
to cells such as muscle fibres* — which means 
that the inheritance of a single mutant copy of 
DMPK from either parent is sufficient to cause 
disease. On page 111 of this issue, Wheeler 
and colleagues’ describe the successful use of 
antisense oligonucleotides (synthetic RNA- 
like fragments that bind to a target RNA) to 
correct the molecular and physiological 


36 | NATURE | VOL 488 | 2 AUGUST 2012 


features of the disease in a mouse model. 

The mutation in DMPK elicits the toxicity 
associated with myotonic dystrophy type 1 
(DM1) through at least two mechanisms’. Usu- 
ally, DMPK-encoded mRNA is synthesized in 
the cell’s nucleus and efficiently exported to 
the cytoplasm, where it is translated into pro- 
tein (Fig. 1a). However, owing to its expanded 
repeat, the mutant RNA forms a hairpin- 
shaped structure that binds to members of the 
MBNL family of proteins, which regulate RNA 
splicing, a process by which immature mRNAs 
are cut up and reassembled before translation 
(Fig. 1b). As a result, the mutant RNA and the 
proteins form aggregates (‘foci’) in the nucleus, 
and MBNL activity is decreased. In a second 
mode of action, the mutation somehow triggers 
increased levels of a different splicing protein, 
CELF1. The opposing effects of the DMPK 
mutation on CELF1 and MBNL activities 
disrupt RNA splicing and so lead to disease’. 


© 2012 Macmillan Publishers Limited. All rights reserved 


of CO, inventories increasingly uncertain?”. 
Improvements in, and full transparency of, 
carbon accounting would help not only in 
negotiations aimed at reducing CO, emis- 
sions, but also in precisely budgeting natural 
CO, fluxes. This is true both on the global scale 
studied by Ballantyne et al. and for regional 
fluxes where carbon sinks are active. = 


Ingeborg Levin is at the Institut fiir 
Umweltphysik, Heidelberg University, 
Heidelberg D-69120, Germany. 

e-mail: ingeborg.levin@iup.uni-heidelberg.de 


1. Friedlingstein, P. et al. J. Clim. 19, 3337-3353 
(2006). 

2. Sarmiento, J. L. et al. Biogeosciences 7, 2351-2367 
(2010). 

3. Zhao, M. & Running, S. W. Science 329, 940-943 
(2010). 

4. Ballantyne, A. P, Alden, C. B., Miller, J. B., Tans, P. P. 

& White, J. W. C. Nature 488, 70-72 (2012). 

5. Marland, G., Hamal, K. & Jonas, M. J. Ind. Ecol. 13, 

4-7 (2009). 

6. http://cdiac.ornl.gov/trends/emis/ 

prelim_2009_2010_estimates.html 

7. Gruber, N. Phil. Trans. R. Soc. A 369, 1980-1996 

(2011). 

8. Piao, S. et al. Nature 451, 49-52 (2008). 

9. Knorr, W. Geophys. Res. Lett. 36, L21710 (2009). 

10.Guan, D., Liu, Z., Geng, Y., Lindner, S. & Hubacek, K. 

Nature Clim. Change advance online publication 

http://dx.doi.org/10.1038/nclimate1560 (2012). 

11.www.esrl.noaa.gov/gmd/ccgg/trends 


A potential strategy for treating DM1 is to 
‘silence’ the mutant DMPK using antisense 
oligonucleotides (Fig. 1c). Because patients 
carry a functional copy of the gene in addi- 
tion to the altered one, oligonucleotides could 
be designed that would bind selectively to 
the mutant RNA and mark it for degrada- 
tion — while leaving translation of the func- 
tional RNA unaffected. This strategy can be 
tested in a mouse model of DM1, in which the 
expanded repeat from a mutant DMPK has 
been added to an unrelated gene, the expres- 
sion of which is easy to track in muscle’. 
Indeed, it has been shown that direct injection 
of a specific type of antisense oligonucleo- 
tide (morpholino oligonucleotides) into the 
muscles of such mice reduced the toxicity of 
the mutant RNA, although systemic delivery 
proved inefficient’. 

Wheeler et al. used ‘gapmer’ antisense 
oligonucleotides, which contain chemical modi- 
fications at their ends that make them more 
stable. Moreover, gapmer oligonucleotides 
include a central sequence that, when bound 
to its target RNA, promotes target cleavage by 
the enzyme RNase H. The authors adminis- 
tered gapmer oligonucleotides to DM1 model 
mice by subcutaneous injection, and observed 
a robust and sustained decrease in the concen- 
tration of the toxic RNA in muscle, even when 
applying relatively low doses. This decrease 
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Figure 1 | How to silence a toxic RNA. a, Most messenger RNAs, such as that encoded by the DMPK 
gene, are processed by splicing proteins and then rapidly exported from the nucleus into the cytoplasm, 
where they are translated into proteins. MBNL splicing proteins interact with many mRNAs, 

including the DMPK mRNA. b, Expansion ofa repeat region in DMPK mRNA causes myotonic 
dystrophy type 1 by sequestering MBNL proteins and retaining them in the nucleus, thereby affecting 
the splicing and expression of many cellular RNAs. c, Wheeler and colleagues’ describe the successful 
use of antisense oligonucleotides (short RNA-like molecules) to ameliorate the disease’s symptoms in a 
mouse model. The oligonucleotides bind to the mutant RNA and selectively induce its destruction 


in the nucleus by the enzyme RNase H. 


and correction of RNA splicing defects and the 
resultant myotonia. Astonishingly, the most 
effective oligonucleotides continued to confer 
some benefit up to a year after treatment had 
been discontinued. 

Animal tests of RNA-directed therapies for 
muscle diseases such as DM1 have had lim- 
ited success so far’ °. So, why this apparent 
breakthrough? It comes down to the therapy’s 
probable site of action: the nucleus. Most 
mRNAs are synthesized and spliced in the 
nucleus, then rapidly exported to the cyto- 
plasm. But gapmer oligonucleotides induce 
degradation of RNA by RNase H, which is 
enriched in the nucleus and almost absent 
from the cytoplasm. Wheeler and colleagues, 
and a second research group working indepen- 
dently’, reasoned that the nuclear retention 
of expanded-repeat RNAs could make them 
good targets for RNase H. Consistent with this 
idea, the authors describe how oligonucleo- 
tides designed to target RNAs that are rapidly 
exported to the cytoplasm were ineffective 
at decreasing their expression in muscle. By 
contrast, oligonucleotides targeting a nuclear 
RNA (the long non-coding RNA Malat1) dem- 
onstrated similar efficacy to that seen when 
targeting the expanded-repeat RNA. 

What are the implications of Wheeler and 
colleagues’ results? They inspire optimism 
that previous challenges faced by researchers 
looking at antisense oligonucleotide therapies 
for DM1 and other neuromuscular diseases 


are surmountable — although significant 
hurdles remain regarding safety and delivery 
to affected tissues other than skeletal muscle, 
such as the heart and brain. The authors’ find- 
ings also suggest that gapmer-based strategies 
might be suitable for the treatment of other 
disorders caused by expansions of repeated 
DNA sequences (such as amyotrophic lateral 
sclerosis and frontotemporal dementia’”"’), 
provided that the mutant RNA tends to remain 
in the cell’s nucleus longer than the normal 
RNA. Furthermore, appropriately designed 
gapmer oligonucleotides may aid researchers 
in defining the functions of specific nuclear 
non-coding RNAs, some of which have key 
roles in regulating gene expression. 

However, as promising as these findings are 
for the prospect of DM1 therapeutics, they also 
serve as a cautionary tale for the applications 
of antisense oligonucleotides. First, given our 
limited understanding of the roles of nuclear 
non-coding RNAs and the likelihood that their 
sensitivity to this technology is enhanced, care 
must be taken in oligonucleotide design to 
avoid potentially deleterious off-target effects. 
Second, developing similarly potent therapies 
for target mRNAs that are rapidly exported 
from the nucleus may require the use of oligo- 
nucleotides that do not act through RNase H. 
Third, therapeutic success in a mouse model 
is still a long way from effective application in 
humans. However, the path to success now 
seems clearly visible. m 
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50 Years Ago 


The importance of lunar natural 
resources for the future of space 
exploration can scarcely be 
exaggerated. Lunar resources will 
not only play an important part 
in the establishment of a lunar 
base by providing life support 
materials and vehicle fuels but 
will also be an important, and 
perhaps a limiting, factor in the 
logistics of interplanetary space 
exploration. Certainly, only the 
most cursory exploration of the 
solar system could be conducted 
using either existing or planned 
propulsion systems so long as the 
rocket vehicles must lift all their 
fuel from the surface of the Earth. 
A lunar fuel source, on the other 
hand, would provide an extremely 
convenient low-gravity refuelling 
station in space. 

From Nature 4 August 1962 


100 Years Ago 


A note bearing on the much- 
debated question of the age of the 
earth is given in the Proceedings 
of the Tokyo Mathematico- 
physical Society by S. Suzuki. The 
calculation refers to the time taken 
for the present crust of the earth to 
solidify. A result is obtained on the 
supposition that the heat of fusion 
liberated by the solidification of 
the crust supplies the heat lost by 
radiation, and it is further assumed 
that the effect of the curvature 

of the earth’s surface may be 
neglected. According to these 
hypotheses the calculated time 
varies between 30 and 300 million 
years, according to the kind of rock 
(gneiss, basalt, or granite) assumed 
in the calculations. The difficulty 
is, of course, our imperfect 
knowledge of the experimental 
data on which the conclusions 

are based. 

[Editor's note: Latest estimates give 
Earth's age as 4.5 billion years. ] 
From Nature 1 August 1912 
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Arare gene variant has been found that decreases the peptide deposition seen in 
the brains of people with Alzheimer’s disease. The mutation may also slow the 
normal cognitive decline that occurs with age. SEE LETTER P.96 


BART DE STROOPER & THIERRY VOET 


Ithough our understanding of the 
Ate processes causing Alzheimer’s 
disease remains hazy, one clear defin- 

ing feature of the disease is an accumulation 
in the brain of deposits of amyloid-f peptides. 
These form following cleavage of the amyloid 
precursor protein (APP), and mutations in the 
APP gene can affect this cleavage and/or the 
biophysical properties of the peptides’, leading 
to their aggregation into toxic peptide assem- 
blies”* that propagate in the brain and promote 
neurodegeneration**. All APP mutations 
identified so far increase levels of these toxic 
amyloid-f species, but might APP mutations 
exist that act in the opposite way, by reducing 
peptide accumulation and protecting against 
neurodegeneration? On page 96 of this issue, 
Jonsson et al.° report finding the proverbial 
needle in a haystack: a rare variant of the APP 
gene that protects against Alzheimer’s disease. 
The mutation causes an amino-acid 


Amyloid precursor protein 


substitution in APP (alanine to threonine; 
A673T) close to the site at which the protein 
is cleaved by the enzyme B-secretase (Fig. 1). 
The authors show that fewer amyloid-B (Af) 
peptides are generated in cultured cells that 
express this gene variant. It is worth noting 
that a different amino-acid substitution 
at exactly the same site (alanine to valine; 
A673V) can cause Alzheimer’s disease’, 
emphasizing the pathological relevance of 
this amino-acid residue. 

Fascinatingly, Jonsson and colleagues also 
found that, in a cohort of people over the age 
of 80, those who were heterozygous for the 
A673T variant (one of their two copies of the 
APP gene was mutated) performed better 
in a test of mental capacity than did control 
subjects. The authors derive from this obser- 
vation the rather bold conclusion that the 
identified variant not only protects against 
Alzheimer’s disease but also against the normal 
mild cognitive decline that is associated 
with old age®. 


Z\ 


Alanine to threonine: protective 
Alanine to valine: causative 


Figure 1 | Amyloid precursor protein and Alzheimer’s disease. The amyloid precursor protein (APP) 
is cleaved by the enzymes $-secretase and y-secretase into amyloid-B (AB) peptides (scissor symbols 
represent the cleavage sites). An accumulation of Af peptides is seen in the brains of patients with 
Alzheimer’s disease, and multiple mutations that cause changes in the amino-acid sequence of APP (blue 
bars) are associated with the disease'*"*. By contrast, Jonsson and colleagues’ have identified a mutation in 
the APP gene that reduces Af accumulation and that protects against Alzheimer’s disease. The mutation 
results in an alanine-to-threonine amino-acid substitution close to the B-secretase cleavage site (red bar). 
Interestingly, an alanine-to-valine substitution at the same site can cause Alzheimer’s disease’. 
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In their search for rare APP mutations with 
significant effect on the risk of developing 
Alzheimer’s disease, Jonsson and colleagues 
used a unique collection of genotypic, medical 
and genealogical records of Icelandic people 
gathered by the company deCODE Genet- 
ics. The authors screened the full genome 
sequences of 1,795 Icelanders for variants 
in and outside the APP gene and used com- 
putational methods to predict these variant 
sequences in approximately 370,000 compa- 
triots. This combination of sequence and rela- 
tionship information*” enabled the authors to 
test the association of the APP variants they 
identified with common late-onset Alzhei- 
mer’s disease. They found that the protective 
A673T variant was significantly more com- 
mon in a group of over-85-year-olds without 
Alzheimer’s disease (the incidence was 0.62%) 
—and even more so in cognitively intact over- 
85-year-olds (0.79%) — than in patients with 
Alzheimer’s disease (0.13%). Although these 
findings were largely based on predicted geno- 
types, the authors demonstrated the accuracy 
of their calculations by subsequently testing 
for the A673T gene variant in thousands of 
the individuals they studied. 

Jonsson and colleagues also used in vitro 
studies to show that the mutated APP pro- 
tein undergoes about 40% less cleavage by 
B-secretase compared with wild-type APP. 
Together, B-secretase and y-secretase are 
responsible for the cleavage of AB peptides 
from APP (Fig. 1); the enzymes are currently 
being assessed as targets for the treatment 
of Alzheimer’s disease’, and the finding of 
reduced Af accumulation following impaired 
B-secretase cleavage is encouraging for this 
line of inquiry. At first glance, one is tempted 
to draw the exciting conclusion thata lifelong, 
moderate lowering of Af is all that is needed to 
postpone Alzheimer’s disease and to maintain 
good cognition. Indeed, researchers have long 
been struggling with the tantalizing question 
of to what extent levels of AB must be lowered 
to be effective in the treatment or prevention 
of Alzheimer’s disease". 

However, it remains to be seen how these 
in vitro observations translate in the human 
brain, where the protective A673T variant is 
expressed together with a wild-type copy of 
the gene. If the translation is linear, then the 
40% decrease in AB levels observed in vitro 
would result in a 20% decrease in AB in people 
carrying the mutation. It would be extremely 
instructive to test this prediction using blood, 
cerebrospinal fluid or cells from individuals 
carrying the mutation. 
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Figure 1 | How to silence a toxic RNA. a, Most messenger RNAs, such as that encoded by the DMPK 
gene, are processed by splicing proteins and then rapidly exported from the nucleus into the cytoplasm, 
where they are translated into proteins. MBNL splicing proteins interact with many mRNAs, 

including the DMPK mRNA. b, Expansion ofa repeat region in DMPK mRNA causes myotonic 
dystrophy type 1 by sequestering MBNL proteins and retaining them in the nucleus, thereby affecting 
the splicing and expression of many cellular RNAs. c, Wheeler and colleagues’ describe the successful 
use of antisense oligonucleotides (short RNA-like molecules) to ameliorate the disease’s symptoms in a 
mouse model. The oligonucleotides bind to the mutant RNA and selectively induce its destruction 


in the nucleus by the enzyme RNase H. 


and correction of RNA splicing defects and the 
resultant myotonia. Astonishingly, the most 
effective oligonucleotides continued to confer 
some benefit up to a year after treatment had 
been discontinued. 

Animal tests of RNA-directed therapies for 
muscle diseases such as DM1 have had lim- 
ited success so far’ °. So, why this apparent 
breakthrough? It comes down to the therapy’s 
probable site of action: the nucleus. Most 
mRNAs are synthesized and spliced in the 
nucleus, then rapidly exported to the cyto- 
plasm. But gapmer oligonucleotides induce 
degradation of RNA by RNase H, which is 
enriched in the nucleus and almost absent 
from the cytoplasm. Wheeler and colleagues, 
and a second research group working indepen- 
dently’, reasoned that the nuclear retention 
of expanded-repeat RNAs could make them 
good targets for RNase H. Consistent with this 
idea, the authors describe how oligonucleo- 
tides designed to target RNAs that are rapidly 
exported to the cytoplasm were ineffective 
at decreasing their expression in muscle. By 
contrast, oligonucleotides targeting a nuclear 
RNA (the long non-coding RNA Malat1) dem- 
onstrated similar efficacy to that seen when 
targeting the expanded-repeat RNA. 

What are the implications of Wheeler and 
colleagues’ results? They inspire optimism 
that previous challenges faced by researchers 
looking at antisense oligonucleotide therapies 
for DM1 and other neuromuscular diseases 


are surmountable — although significant 
hurdles remain regarding safety and delivery 
to affected tissues other than skeletal muscle, 
such as the heart and brain. The authors’ find- 
ings also suggest that gapmer-based strategies 
might be suitable for the treatment of other 
disorders caused by expansions of repeated 
DNA sequences (such as amyotrophic lateral 
sclerosis and frontotemporal dementia’”"’), 
provided that the mutant RNA tends to remain 
in the cell’s nucleus longer than the normal 
RNA. Furthermore, appropriately designed 
gapmer oligonucleotides may aid researchers 
in defining the functions of specific nuclear 
non-coding RNAs, some of which have key 
roles in regulating gene expression. 

However, as promising as these findings are 
for the prospect of DM1 therapeutics, they also 
serve as a cautionary tale for the applications 
of antisense oligonucleotides. First, given our 
limited understanding of the roles of nuclear 
non-coding RNAs and the likelihood that their 
sensitivity to this technology is enhanced, care 
must be taken in oligonucleotide design to 
avoid potentially deleterious off-target effects. 
Second, developing similarly potent therapies 
for target mRNAs that are rapidly exported 
from the nucleus may require the use of oligo- 
nucleotides that do not act through RNase H. 
Third, therapeutic success in a mouse model 
is still a long way from effective application in 
humans. However, the path to success now 
seems clearly visible. m 
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50 Years Ago 


The importance of lunar natural 
resources for the future of space 
exploration can scarcely be 
exaggerated. Lunar resources will 
not only play an important part 
in the establishment of a lunar 
base by providing life support 
materials and vehicle fuels but 
will also be an important, and 
perhaps a limiting, factor in the 
logistics of interplanetary space 
exploration. Certainly, only the 
most cursory exploration of the 
solar system could be conducted 
using either existing or planned 
propulsion systems so long as the 
rocket vehicles must lift all their 
fuel from the surface of the Earth. 
A lunar fuel source, on the other 
hand, would provide an extremely 
convenient low-gravity refuelling 
station in space. 

From Nature 4 August 1962 


100 Years Ago 


A note bearing on the much- 
debated question of the age of the 
earth is given in the Proceedings 
of the Tokyo Mathematico- 
physical Society by S. Suzuki. The 
calculation refers to the time taken 
for the present crust of the earth to 
solidify. A result is obtained on the 
supposition that the heat of fusion 
liberated by the solidification of 
the crust supplies the heat lost by 
radiation, and it is further assumed 
that the effect of the curvature 

of the earth’s surface may be 
neglected. According to these 
hypotheses the calculated time 
varies between 30 and 300 million 
years, according to the kind of rock 
(gneiss, basalt, or granite) assumed 
in the calculations. The difficulty 
is, of course, our imperfect 
knowledge of the experimental 
data on which the conclusions 

are based. 

[Editor's note: Latest estimates give 
Earth's age as 4.5 billion years. ] 
From Nature 1 August 1912 
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Arare gene variant has been found that decreases the peptide deposition seen in 
the brains of people with Alzheimer’s disease. The mutation may also slow the 
normal cognitive decline that occurs with age. SEE LETTER P.96 


BART DE STROOPER & THIERRY VOET 


Ithough our understanding of the 
Ate processes causing Alzheimer’s 
disease remains hazy, one clear defin- 

ing feature of the disease is an accumulation 
in the brain of deposits of amyloid-f peptides. 
These form following cleavage of the amyloid 
precursor protein (APP), and mutations in the 
APP gene can affect this cleavage and/or the 
biophysical properties of the peptides’, leading 
to their aggregation into toxic peptide assem- 
blies”* that propagate in the brain and promote 
neurodegeneration**. All APP mutations 
identified so far increase levels of these toxic 
amyloid-f species, but might APP mutations 
exist that act in the opposite way, by reducing 
peptide accumulation and protecting against 
neurodegeneration? On page 96 of this issue, 
Jonsson et al.° report finding the proverbial 
needle in a haystack: a rare variant of the APP 
gene that protects against Alzheimer’s disease. 
The mutation causes an amino-acid 


Amyloid precursor protein 


substitution in APP (alanine to threonine; 
A673T) close to the site at which the protein 
is cleaved by the enzyme B-secretase (Fig. 1). 
The authors show that fewer amyloid-B (Af) 
peptides are generated in cultured cells that 
express this gene variant. It is worth noting 
that a different amino-acid substitution 
at exactly the same site (alanine to valine; 
A673V) can cause Alzheimer’s disease’, 
emphasizing the pathological relevance of 
this amino-acid residue. 

Fascinatingly, Jonsson and colleagues also 
found that, in a cohort of people over the age 
of 80, those who were heterozygous for the 
A673T variant (one of their two copies of the 
APP gene was mutated) performed better 
in a test of mental capacity than did control 
subjects. The authors derive from this obser- 
vation the rather bold conclusion that the 
identified variant not only protects against 
Alzheimer’s disease but also against the normal 
mild cognitive decline that is associated 
with old age®. 
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Alanine to threonine: protective 
Alanine to valine: causative 


Figure 1 | Amyloid precursor protein and Alzheimer’s disease. The amyloid precursor protein (APP) 
is cleaved by the enzymes $-secretase and y-secretase into amyloid-B (AB) peptides (scissor symbols 
represent the cleavage sites). An accumulation of Af peptides is seen in the brains of patients with 
Alzheimer’s disease, and multiple mutations that cause changes in the amino-acid sequence of APP (blue 
bars) are associated with the disease'*"*. By contrast, Jonsson and colleagues’ have identified a mutation in 
the APP gene that reduces Af accumulation and that protects against Alzheimer’s disease. The mutation 
results in an alanine-to-threonine amino-acid substitution close to the B-secretase cleavage site (red bar). 
Interestingly, an alanine-to-valine substitution at the same site can cause Alzheimer’s disease’. 
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In their search for rare APP mutations with 
significant effect on the risk of developing 
Alzheimer’s disease, Jonsson and colleagues 
used a unique collection of genotypic, medical 
and genealogical records of Icelandic people 
gathered by the company deCODE Genet- 
ics. The authors screened the full genome 
sequences of 1,795 Icelanders for variants 
in and outside the APP gene and used com- 
putational methods to predict these variant 
sequences in approximately 370,000 compa- 
triots. This combination of sequence and rela- 
tionship information*” enabled the authors to 
test the association of the APP variants they 
identified with common late-onset Alzhei- 
mer’s disease. They found that the protective 
A673T variant was significantly more com- 
mon in a group of over-85-year-olds without 
Alzheimer’s disease (the incidence was 0.62%) 
—and even more so in cognitively intact over- 
85-year-olds (0.79%) — than in patients with 
Alzheimer’s disease (0.13%). Although these 
findings were largely based on predicted geno- 
types, the authors demonstrated the accuracy 
of their calculations by subsequently testing 
for the A673T gene variant in thousands of 
the individuals they studied. 

Jonsson and colleagues also used in vitro 
studies to show that the mutated APP pro- 
tein undergoes about 40% less cleavage by 
B-secretase compared with wild-type APP. 
Together, B-secretase and y-secretase are 
responsible for the cleavage of AB peptides 
from APP (Fig. 1); the enzymes are currently 
being assessed as targets for the treatment 
of Alzheimer’s disease’, and the finding of 
reduced Af accumulation following impaired 
B-secretase cleavage is encouraging for this 
line of inquiry. At first glance, one is tempted 
to draw the exciting conclusion thata lifelong, 
moderate lowering of Af is all that is needed to 
postpone Alzheimer’s disease and to maintain 
good cognition. Indeed, researchers have long 
been struggling with the tantalizing question 
of to what extent levels of AB must be lowered 
to be effective in the treatment or prevention 
of Alzheimer’s disease". 

However, it remains to be seen how these 
in vitro observations translate in the human 
brain, where the protective A673T variant is 
expressed together with a wild-type copy of 
the gene. If the translation is linear, then the 
40% decrease in AB levels observed in vitro 
would result in a 20% decrease in AB in people 
carrying the mutation. It would be extremely 
instructive to test this prediction using blood, 
cerebrospinal fluid or cells from individuals 
carrying the mutation. 


Despite this promising prospect, the question 
remains as to whether a reduction in AB 
explains the protective effect of the gene vari- 
ant identified by Jonsson and colleagues. Here, 
it is worth keeping in mind a previously iden- 
tified Alzheimer’s-disease-causing mutation, 
the A673V APP gene variant’. This mutation 
increases Af generation but causes dementia 
only in people in which both gene copies are 
mutated, not just one. It also affects not only 
the amount, but also, and importantly for this 
discussion, the biophysical properties of the 
Af that is generated. It seems that the mutated 
protein interacts with wild-type Af to prevent 
the generation of toxic AB assemblies. Given 
that another mutation at the same site in the 
APP protein also affects the aggregation prop- 
erties of AB peptides”, the possibility that Jons- 
son and colleagues’ A673T mutation exerts its 
protective effects by altering AB aggregation 
should be considered. This more qualitative 
concept of AB toxicity contrasts with the idea 
that only an increase in AB levels can cause dis- 
ease — and evidence supporting this insight is 
rapidly mounting”. 

Further work is certainly needed to verify 
whether the A673T mutation protects against 
age-related cognitive decline. Jonsson et al. 
report that A673T carriers perform better in 
cognitive tests than do control subjects, but 
one wonders whether this can be confirmed 
by other measurements of cognitive function, 
and whether confounding factors compli- 
cate the interpretation of the reported result. 
For example, although there were no known 
cases of Alzheimer’s disease in the control 
population, many other conditions, such as 
Parkinson's disease or depression (which were 
not excluded in this assessment), can also 
negatively affect mental capacity. So Jonsson 
and colleagues’ proposal® that “Alzheimer’s 
disease may represent the extreme of the age- 
related decline in cognitive function” may 
yet prove to be a premature interpretation of 
their findings. 

Nevertheless, the identification of a pro- 
tective APP gene variant is certainly exciting, 
and it will be interesting to watch for the iden- 
tification of other protective gene variants, 
for example, mutations in the gene encod- 
ing B-secretase that might inhibit its expres- 
sion or its proteolytic activity. As with many 
genetic findings, more years of hard work will 
be needed to assess the clinical and therapeutic 
implications of such findings. But if the prelim- 
inary — and quite spectacular — conclusions 
of Jonsson et al. regarding the mechanism of 
action of the A673T mutation, and its impli- 
cations for cognition, can be confirmed, then 
a lifelong suppression of AB production by as 
little as 20% may one day become the ‘fountain 
of youth for the brain. m 
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Strongly interacting 


photons 


A fine marriage between atomic and optical physics has produced a medium that is 
transparent to single photons but opaque to multiple photons. The finding heralds 
the development of devices such as single-photon switches. SEE LETTER P.57 


THAD G. WALKER 


an photons be made to interact strongly 

with each other? Until recently, mater- 

ials with nonlinear optical properties 
could mediate photon—photon interactions 
that were weak at best. These weak interactions 
have previously been artificially enhanced 
using devices known as optical cavities', which 
make the photons repeat their encounters 
thousands to millions of times. On page 57 of 
this issue, Peyronel et al.” demonstrate a new 
material in which single photons propagate 
freely, but interact so strongly with each other 
that when just two photons are present one 
is quickly absorbed. The result opens up the 
possibility of realizing concepts such as single- 
photon switches, deterministic photon-based 
quantum logic, and quantum gases of strongly 
interacting photons*. 

We have known since the dawn of quantum 
physics a century ago that light consists of par- 
ticles, called photons, of energy hf, where h is 
Planck’s constant and fis the light’s frequency. 
Photons usually interact extremely weakly 
with each other, but strongly with the charged 
particles that comprise matter. In most mater- 
ials, the optical response is linear — a beam 
comprised of many photons scatters and 
moves from place to place in the same way that 
single photons do. Inside nonlinear materials, 
however, the optical response is altered when 
multiple photons are present. The motion of 
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a particular photon depends on the proper- 
ties — most notably the number — of other 
photons in its vicinity. Until recently, however, 
available nonlinear materials required large 
numbers of photons to be present in order for 
them to noticeably affect each other. Peyronel 
et al.” combined several recent developments 
in atomic and optical physics to produce a 
novel nonlinear medium that is transpar- 
ent to single photons yet opaque to multiple 
photons (Fig. 1). 

The largest nonlinear optical effects achiev- 
able in atoms occur when a light field renders 
the atoms transparent. Consider a sample of 
atoms that have three energy levels: a ground 
state Ep an excited state E, and an intermediate 
level E, (see Fig. 1b of the paper’). Photons of 
frequency f, that are directed into the sample 
and obey the Bohr equation, hf, = E,— E. will 
normally be absorbed. However, when a strong 
‘control laser of frequency f, is also shone on 
the sample, the atoms become transparent to 
frequency f, if the condition h (f,+f,)=E,—E, 
is satisfied. This electromagnetically induced 
transparency (EIT; ref. 3) puts the atoms and 
photons into collective excitations called 
polaritons. In Peyronel and colleagues’ exper- 
iment, the control laser changes the trans- 
mission of the atomic gas from essentially 
zero to 60%. The EIT condition is extremely 
sensitive: photons that obey it are transmit- 
ted with high probability, whereas those 
that violate it are absorbed normally. This is 
basically an optically controlled switch’, 

When the upper level r is a state of large 
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Despite this promising prospect, the question 
remains as to whether a reduction in AB 
explains the protective effect of the gene vari- 
ant identified by Jonsson and colleagues. Here, 
it is worth keeping in mind a previously iden- 
tified Alzheimer’s-disease-causing mutation, 
the A673V APP gene variant’. This mutation 
increases Af generation but causes dementia 
only in people in which both gene copies are 
mutated, not just one. It also affects not only 
the amount, but also, and importantly for this 
discussion, the biophysical properties of the 
Af that is generated. It seems that the mutated 
protein interacts with wild-type Af to prevent 
the generation of toxic AB assemblies. Given 
that another mutation at the same site in the 
APP protein also affects the aggregation prop- 
erties of AB peptides”, the possibility that Jons- 
son and colleagues’ A673T mutation exerts its 
protective effects by altering AB aggregation 
should be considered. This more qualitative 
concept of AB toxicity contrasts with the idea 
that only an increase in AB levels can cause dis- 
ease — and evidence supporting this insight is 
rapidly mounting”. 

Further work is certainly needed to verify 
whether the A673T mutation protects against 
age-related cognitive decline. Jonsson et al. 
report that A673T carriers perform better in 
cognitive tests than do control subjects, but 
one wonders whether this can be confirmed 
by other measurements of cognitive function, 
and whether confounding factors compli- 
cate the interpretation of the reported result. 
For example, although there were no known 
cases of Alzheimer’s disease in the control 
population, many other conditions, such as 
Parkinson's disease or depression (which were 
not excluded in this assessment), can also 
negatively affect mental capacity. So Jonsson 
and colleagues’ proposal® that “Alzheimer’s 
disease may represent the extreme of the age- 
related decline in cognitive function” may 
yet prove to be a premature interpretation of 
their findings. 

Nevertheless, the identification of a pro- 
tective APP gene variant is certainly exciting, 
and it will be interesting to watch for the iden- 
tification of other protective gene variants, 
for example, mutations in the gene encod- 
ing B-secretase that might inhibit its expres- 
sion or its proteolytic activity. As with many 
genetic findings, more years of hard work will 
be needed to assess the clinical and therapeutic 
implications of such findings. But if the prelim- 
inary — and quite spectacular — conclusions 
of Jonsson et al. regarding the mechanism of 
action of the A673T mutation, and its impli- 
cations for cognition, can be confirmed, then 
a lifelong suppression of AB production by as 
little as 20% may one day become the ‘fountain 
of youth for the brain. m 
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Strongly interacting 


photons 


A fine marriage between atomic and optical physics has produced a medium that is 
transparent to single photons but opaque to multiple photons. The finding heralds 
the development of devices such as single-photon switches. SEE LETTER P.57 
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material in which single photons propagate 
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that when just two photons are present one 
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possibility of realizing concepts such as single- 
photon switches, deterministic photon-based 
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ticles, called photons, of energy hf, where h is 
Planck’s constant and fis the light’s frequency. 
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with each other, but strongly with the charged 
particles that comprise matter. In most mater- 
ials, the optical response is linear — a beam 
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a particular photon depends on the proper- 
ties — most notably the number — of other 
photons in its vicinity. Until recently, however, 
available nonlinear materials required large 
numbers of photons to be present in order for 
them to noticeably affect each other. Peyronel 
et al.” combined several recent developments 
in atomic and optical physics to produce a 
novel nonlinear medium that is transpar- 
ent to single photons yet opaque to multiple 
photons (Fig. 1). 

The largest nonlinear optical effects achiev- 
able in atoms occur when a light field renders 
the atoms transparent. Consider a sample of 
atoms that have three energy levels: a ground 
state Ep an excited state E, and an intermediate 
level E, (see Fig. 1b of the paper’). Photons of 
frequency f, that are directed into the sample 
and obey the Bohr equation, hf, = E,— E. will 
normally be absorbed. However, when a strong 
‘control laser of frequency f, is also shone on 
the sample, the atoms become transparent to 
frequency f, if the condition h (f,+f,)=E,—E, 
is satisfied. This electromagnetically induced 
transparency (EIT; ref. 3) puts the atoms and 
photons into collective excitations called 
polaritons. In Peyronel and colleagues’ exper- 
iment, the control laser changes the trans- 
mission of the atomic gas from essentially 
zero to 60%. The EIT condition is extremely 
sensitive: photons that obey it are transmit- 
ted with high probability, whereas those 
that violate it are absorbed normally. This is 
basically an optically controlled switch’, 

When the upper level r is a state of large 
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Figure 1 | A stream of single photons. Peyronel et al. 


Rydberg polariton 


* have directed a beam of overlapping photons 


into an atomic gas in which single photons are converted into collective excitations known as Rydberg 
polaritons. The polaritons, which can be thought of as spheres comprising many atoms and one photon, 
strongly absorb additional photons. On exiting the gas, the polaritons are converted back to individual, 


non-overlapping photons. 


principal quantum number n (a Rydberg 
state’), the EIT condition can be easily vio- 
lated by weak interactions between the atoms. 
For an n of about 100, a single Rydberg atom 
will cause a violation of the EIT condition for 
all other atoms within a ‘blockade radius’ of 
10 micrometres. This Rydberg blockade pro- 
duces record nonlinearities, as shown recently 
by Adams and colleagues®, and has been 
used to entangle neutral atoms separated by 
micrometre-scale distances”*. A Rydberg 
polariton can be thought of as a 10-um sphere 
containing many ground-state atoms and 
one Rydberg atom — or, equivalently, many 
atoms and one photon. Should other photons 
enter a volume already occupied by a Rydberg 
polariton, the blockade effect causes a viola- 
tion of the EIT condition, so the photons 
are absorbed rather than transmitted. Note 
that if the atom density is low, as in previous 
experiments’, the absorption probability may 
still be small. 

The final, essential ingredient needed to 
generate strong photon-photon interactions 
at the two-photon level is an atomic cloud of 
such high density that when two or more pho- 
tons enter a blockade volume, all but one are 
absorbed within that volume, leaving a single 
Rydberg polariton. This ‘photon blockade’ is 
the novelty of Peyronel and colleagues’ study. 
Their experiment reveals that a multi-photon 
incident light beam is converted, within a few 
micrometres, into a beam of single photons, 
with a small (less than 0.09) probability that 
two photons will leave the atomic gas at the 
same time. Interestingly, even though their 
sample is large enough for several Rydberg 
polaritons to coexist, the authors find that 
(and explain why) only one photon at a time is 
found within the entire sample. 

An exciting feature of this experiment is 
that there are several clear avenues towards 
improving the properties of the medium. 
Cooler, denser atomic gases and lasers that 
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have a narrower frequency range would 
improve the EIT transmission to nearly 100% 
and reduce the overlap of photons from the 
single-photon source. A looming challenge is 
to reconfigure the experiment so that the two- 
photon nonlinearity delays rather than absorbs 
excess photons*. This type of nonlinearity, 
which preserves the number of photons, would 
be extremely useful for quantum-information 
purposes. 

In one respect, Peyronel and colleagues have 


SYSTEMS BIOLOGY 


demonstrated a quality single-photon light 
source that has a rate of emission in the mega- 
hertz regime, as Dudin and Kuzmich have 
shown’ using a related approach. The key capa- 
bility of this experiment” — engineering strong 
photon-photon interactions at the two-pho- 
ton level — should also lead to various other 
new possibilities. For example, single-photon 
switches, photon detectors of high quantum 
efficiency, and non-destructive photon detec- 
tion can easily be foreseen as extensions of 
this work. The physics of strongly interacting 
photons has a bright future. = 
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A cellin a computer 


The small genomes of some bacteria could provide the first complete 
understanding of a biological system. A new computer model brings 
this goal closer, by calculating every process ina dividing Mycoplasma cell. 


MARK ISALAN 


reductionism to the limit: to describe a cell 

as a set of interacting components and to 
capture whole-cell behaviour in a computer 
model. A good model doesn’t simply recapitu- 
late the observed behaviours that are fed into 
it. Rather, the aim is to predict the unknown 
effect of any novel perturbation or mutation. 
Such goals are very ambitious because of the 
challenge of attempting to obtain quantitative 
information on every one of the cell’s gene 
products and metabolites. Nevertheless, Karr 
et al.', writing in Cell, present the most com- 
prehensive model of a bacterial cell cycle so far, 
built on the basis of individual molecules and 
their relationships. Impressively, the model 
can predict gene-expression levels and cell- 
replication times in the challenging context of 


lE has long been a dream in biology to push 
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mutations involving gene deletions. 

Mycoplasma genitalium is a urogenital 
bacterial parasite that has only 525 genes, 
making it one of the smallest genomes of any 
independently dividing cell — for comparison, 
the gut bacterium Escherichia coli has around 
4,000 genes. Because of their status as one of 
the ‘simplest’ cells, Mycoplasma species are 
rapidly becoming the most measured biologi- 
cal systems in history, and full descriptions 
of their molecular content, in terms of DNA, 
RNA, protein and metabolites, are available’. 
The cells are therefore considered to be the 
ideal target for whole-cell modelling’. 

What is striking about Karr and colleagues’ 
modelis the sheer ambition of its scale and its 
attention to detail. The authors retrieved (and 
in some cases retested) more than 1,900 exper- 
imentally derived cellular parameters, such as 
enzymatic reaction rates and protein-binding 
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Figure 1 | A stream of single photons. Peyronel et al. 
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principal quantum number n (a Rydberg 
state’), the EIT condition can be easily vio- 
lated by weak interactions between the atoms. 
For an n of about 100, a single Rydberg atom 
will cause a violation of the EIT condition for 
all other atoms within a ‘blockade radius’ of 
10 micrometres. This Rydberg blockade pro- 
duces record nonlinearities, as shown recently 
by Adams and colleagues®, and has been 
used to entangle neutral atoms separated by 
micrometre-scale distances”*. A Rydberg 
polariton can be thought of as a 10-um sphere 
containing many ground-state atoms and 
one Rydberg atom — or, equivalently, many 
atoms and one photon. Should other photons 
enter a volume already occupied by a Rydberg 
polariton, the blockade effect causes a viola- 
tion of the EIT condition, so the photons 
are absorbed rather than transmitted. Note 
that if the atom density is low, as in previous 
experiments’, the absorption probability may 
still be small. 

The final, essential ingredient needed to 
generate strong photon-photon interactions 
at the two-photon level is an atomic cloud of 
such high density that when two or more pho- 
tons enter a blockade volume, all but one are 
absorbed within that volume, leaving a single 
Rydberg polariton. This ‘photon blockade’ is 
the novelty of Peyronel and colleagues’ study. 
Their experiment reveals that a multi-photon 
incident light beam is converted, within a few 
micrometres, into a beam of single photons, 
with a small (less than 0.09) probability that 
two photons will leave the atomic gas at the 
same time. Interestingly, even though their 
sample is large enough for several Rydberg 
polaritons to coexist, the authors find that 
(and explain why) only one photon at a time is 
found within the entire sample. 

An exciting feature of this experiment is 
that there are several clear avenues towards 
improving the properties of the medium. 
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have a narrower frequency range would 
improve the EIT transmission to nearly 100% 
and reduce the overlap of photons from the 
single-photon source. A looming challenge is 
to reconfigure the experiment so that the two- 
photon nonlinearity delays rather than absorbs 
excess photons*. This type of nonlinearity, 
which preserves the number of photons, would 
be extremely useful for quantum-information 
purposes. 

In one respect, Peyronel and colleagues have 


SYSTEMS BIOLOGY 


demonstrated a quality single-photon light 
source that has a rate of emission in the mega- 
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photon-photon interactions at the two-pho- 
ton level — should also lead to various other 
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The small genomes of some bacteria could provide the first complete 
understanding of a biological system. A new computer model brings 
this goal closer, by calculating every process ina dividing Mycoplasma cell. 
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reductionism to the limit: to describe a cell 

as a set of interacting components and to 
capture whole-cell behaviour in a computer 
model. A good model doesn’t simply recapitu- 
late the observed behaviours that are fed into 
it. Rather, the aim is to predict the unknown 
effect of any novel perturbation or mutation. 
Such goals are very ambitious because of the 
challenge of attempting to obtain quantitative 
information on every one of the cell’s gene 
products and metabolites. Nevertheless, Karr 
et al.', writing in Cell, present the most com- 
prehensive model of a bacterial cell cycle so far, 
built on the basis of individual molecules and 
their relationships. Impressively, the model 
can predict gene-expression levels and cell- 
replication times in the challenging context of 
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mutations involving gene deletions. 

Mycoplasma genitalium is a urogenital 
bacterial parasite that has only 525 genes, 
making it one of the smallest genomes of any 
independently dividing cell — for comparison, 
the gut bacterium Escherichia coli has around 
4,000 genes. Because of their status as one of 
the ‘simplest’ cells, Mycoplasma species are 
rapidly becoming the most measured biologi- 
cal systems in history, and full descriptions 
of their molecular content, in terms of DNA, 
RNA, protein and metabolites, are available’. 
The cells are therefore considered to be the 
ideal target for whole-cell modelling’. 

What is striking about Karr and colleagues’ 
modelis the sheer ambition of its scale and its 
attention to detail. The authors retrieved (and 
in some cases retested) more than 1,900 exper- 
imentally derived cellular parameters, such as 
enzymatic reaction rates and protein-binding 


affinities, from around 900 publications. 
They then combined these to make 28 
sub-models of cellular processes, such 
as metabolism, protein translation and 
DNA replication. They used sub-models 
so that they could apply the appropriate 
modelling method for each process. In 
computational biology, this requirement 
has been neatly summarized’ as “Don't 
model bulldozers with quarks”. So the 
authors combined different modelling 
techniques involving varying levels 
of detail, to allow different factors — 
including dependence on deterministic 
reactions, known constraints, prob- 
ability and random variability — to be 
applied where appropriate. 

Crucially, the authors then used a 
computational trick to join up the sub- 
models (Fig. 1a). Models calculate vari- 
ables — numbers that represent varying 
system states. And variables change 
according to sets of rules — the equa- 
tions and parameters used to describe 
the system. The authors allowed each 
sub-model to calculate independently 
the values of a set of 16 variables at a 
time-step of approximately one sec- 
ond. They then combined these results, 
which generated a new set of vari- 
ables, and the process was repeated in 
a loop. Thus, all the sub-models ‘com- 
municated’ with one another and the 
cell’s status was constantly updated 
and recalculated. Although this is an 
approximation, because in reality all 
processes happen simultaneously, the 
end results converged plausibly towards 
the decision to divide, which the authors 
assessed as the moment that the bacterial cell 
membrane ‘pinched’ together to form two new 
cells (Fig. 1b). 

After some optimization, the model pro- 
duced estimates of metabolite concentrations, 
metabolism rates, and messenger RNA and 
protein levels, that were similar to experimen- 
tal data. The model also allowed the authors to 
make several predictions about cell behaviour, 
including that 90% of the cell’s genes will be 
expressed in the first 2.5 hours of the approxi- 
mately 9-hour cell cycle. This prediction sug- 
gests that the chromosomes are ‘explored’ 
rapidly by gene-expression machinery. The key 
test, however, was whether higher-level system 
properties, such as the time taken for the cell 
to replicate itself, would be correctly predicted 
for bacteria carrying genetic mutations. When 
the researchers ran the model with each of the 
525 genes individually deleted, they found that 
284 of the genes are essential for cell survival 
and 117 are non-essential. These numbers are 
approximately 80% in agreement with experi- 
mental data for gene deletions that have been 
assessed previously. 

The authors also tested the growth rates of 
12 of the gene-deleted bacterial strains, and 
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Figure 1 | Looping calculations to model cell division. 
a, Karr et al.' have constructed a computer model that attempts 
to calculate every process in Mycoplasma genitalium cells. Their 
modelling strategy involves 28 independent sub-models of cell 
processes, each incorporating different methods and levels 
of detail. The sub-models communicate by combining their 
calculations for 16 cell variables for approximately one second 

of the cell’s life cycle, and then calculating the next second. The 
looping process culminates when cell division is induced. b, A 
scanning electron micrograph of M. genitalium cells, before (left) 
and during division (right). Scale bar, 0.5 micrometres. 


found 8 to be within the limits predicted by 
the model. In some of these cases, the experi- 
ments resolved discrepancies between the 
model and published growth rates, identify- 
ing, for example, a previously undescribed 
slow-growth mutant. 

So, can the authors claim to have recreated 
acellin acomputer? They themselves say that 
the model should be compared to the first draft 
of the human genome, and be considered a 
work in progress. However, in modelling, dis- 
crepancies between predictions and experi- 
mental results are the key to improvements 
— they direct more detailed analyses and 
model refinement, and ultimately lead to better 
models. More challenging tests could be imag- 
ined. For example, could the model predict 
synthetic lethal mutants, in which the com- 
bination of two gene deletions will killa cell, 
although either deletion alone permits sur- 
vival? Furthermore, any model that attempts 
to predict phenotypes’ (biological proper- 
ties) from genotypes (gene sequences) will be 
subject to the problem that even genetically 
identical cells do not always give the same 
output. For example, random differences in 
the amount of chaperone proteins can ‘buffer’ 
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mutations variably*. However, the Myco- 
plasma model can track such variability 
and therefore has the potential to predict 
these outcomes. 

The metaphor of gene networks being 
connected in wiring diagrams is becom- 
ing commonplace and, even though 
such networks can be non-intuitive’, 
they are ideal for computer modelling. 
Nevertheless, one of the most excit- 
ing ideas in studies of gene regulation 
is that network relationships may not 
always involve direct molecular inter- 
actions. For example, imagine a gene 
that is required for cell division — if 
there is low gene expression, the cell 
will not divide and so gene-expression 
levels will have longer to accumulate, 
which can create a feedback loop, or 
gene expression ‘according to need’. 
Fascinatingly, there is already a hint of 
this in one example from Karr and col- 
leagues’ model. They find that cell-cycle 
time is affected by the concentrations of 
DNA nucleotides (dNTPs), which are 
required for DNA replication. When 
dNTP levels are low, the cycle slows 
at the point of replication initiation, 
allowing dNTPs to build up, which then 
speeds up the rest of the cycle. 

Extrapolating from such findings, 
I can imagine that similar feedback 
processes might exist for any cellular 
factor that contributes indirectly to 
reducing its own concentration. For 
example, factors for which transiently 
low concentrations reduce the activ- 
ity of pathways for cell division, pro- 
tein secretion or protein degradation 
might similarly self-regulate, restoring and 
buffering themselves over time. So perhaps 
the most exciting thing about a whole-cell 
model is that it may allow us to look beyond 
the direct molecular ‘cogs and wheels’ 
that drive biology and into the emergent 
properties of biological systems. m 
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Novel mutations target distinct 
subgroups of medulloblastoma 


Giles Robinson!??*, Matthew Parker!**, Tanya A. Kranenburg'*, Charles Lu>°, Xiang Chen’, Li Ding'**, 

Timothy N. Phoenix”, Erin Hedlund'*, Lei Wei’, Xiaoyan Zhu’, Nader Chalhoub!, Suzanne J. Baker’, Robert Huether'*, 
Richard Kriwacki!®, Natasha Curley”, Radhika Thiruvenkatam’”, Jianmin Wang’, Gang Wu!*, Michael Rusch’*, Xin Hong’, 
Jared Becksfort!’, Pankaj Gupta’’, Jing Ma’, John Easton’*, Bhavin Vadodaria’*, Arzu Onar-Thomas!’°, Tong Lin'!°, 

Shaoyi Li>!°, Stanley Pounds'!®, Steven Paugh’", David Zhao’, Daisuke Kawauchi!”, Martine F. Roussel>”, 

David Finkelstein’, David W. Ellison'’’, Ching C. Lau’, Eric Bouffet'!*, Tim Hassall''®, Sridharan Gururangan'"*, 

Richard Cohn!’, Robert S. Fulton’®®, Lucinda L. Fulton®°, David J. Dooling!*°®, Kerri Ochoa!®*®, Amar Gajjar??, 


Elaine R. Mardis'**'8, Richard K. Wilson'**’, James R. Downing"’, Jinghui Zhang’ & Richard J. Gilbertson’? 


Medulloblastoma is a malignant childhood brain tumour comprising four discrete subgroups. Here, to identify mutations 
that drive medulloblastoma, we sequenced the entire genomes of 37 tumours and matched normal blood. One-hundred 
and thirty-six genes harbouring somatic mutations in this discovery set were sequenced in an additional 56 
medulloblastomas. Recurrent mutations were detected in 41 genes not yet implicated in medulloblastoma; several 
target distinct components of the epigenetic machinery in different disease subgroups, such as regulators of H3K27 
and H3K4 trimethylation in subgroups 3 and 4 (for example, KDM6A and ZMYM3), and CTNNB1-associated chromatin 
re-modellers in WNT-subgroup tumours (for example, SMARCA4 and CREBBP). Modelling of mutations in mouse lower 
rhombic lip progenitors that generate WNT-subgroup tumours identified genes that maintain this cell lineage (DDX3X), 
as well as mutated genes that initiate (CDH1) or cooperate (PIK3CA) in tumorigenesis. These data provide important new 


insights into the pathogenesis of medulloblastoma subgroups and highlight targets for therapeutic development. 


Medulloblastoma is the most common malignant childhood brain 
tumour". The disease includes four subgroups (sonic hedgehog (SHH) 
subgroup, WNT subgroup, subgroup 3 and subgroup 4), defined 
primarily by gene expression profiling, that show differences in 
karyotype, histology and prognosis’. Studies of genetically engineered 
mice show that these tumours arise from different cell types: SHH- 
subgroup medulloblastomas develop from committed cerebellar 
granule neuron progenitors (GNPs) in PtchI1*’~ mice**; WNT- 
subgroup tumours are generated by lower rhombic lip progenitors 
(LRLPs) in Blbp-Cre;Ctnnb1 "©, Tp 53h mice®; whereas 
subgroup-3 medulloblastomas probably arise from an undefined class 
of cerebellar progenitors®. The identification of medulloblastoma sub- 
groups has not changed clinical practice. All patients currently 
receive the same combination of surgery, radiation and chemotherapy. 
This aggressive treatment fails to cure two thirds of patients with 
subgroup-3 disease, and probably over-treats children with WNT- 
subgroup medulloblastoma who invariably survive with long-term 
cognitive and endocrine side effects”’. Drugs targeting the genetic 
alterations that drive each medulloblastoma subgroup could prove 
more effective and less toxic, but the identity of these alterations 
remains largely unknown. 


The genomic landscape of medulloblastoma 


To identify genetic alterations that drive medulloblastoma, we per- 
formed whole-genome sequencing (WGS) of DNA from 37 tumours 
and matched normal blood (discovery cohort). Tumours were sub- 
grouped by gene expression (WNT subgroup, n = 5; SHH subgroup, 
n = 5; subgroup 3, n = 6; subgroup 4, n = 19; ‘unclassified’ (profiles 
not available), n= 2; Fig. 1, Supplementary Figs 1-3 and Sup- 
plementary Table 1). Validation of all putative somatic alterations 
including single nucleotide variations (SNVs), insertion/deletions 
(indels) and structural variations (SVs) identified by CREST®, was 
conducted for 12 tumours using custom capture arrays and 
Illumina-based DNA sequencing (Supplementary Table 2). Putative 
coding alterations and SVs were validated in the remaining 25 
discovery cohort cases by polymerase chain reaction (PCR) and 
Sanger-based sequencing. Mutation frequency was determined in a 
separate ‘validation cohort’ of 56 medulloblastomas (WNT subgroup, 
n = 6; SHH subgroup, n = 8; subgroup 3, n = 11; subgroup 4, n = 19; 
unclassified, n = 12; Fig. 1 and Supplementary Table 1). 

WGS of the discovery cohort detected 22,887 validated or high- 
quality somatic sequence mutations (SNVs and indels), 536 validated 
or curated SVs, and 5,802 copy number variations (CNVs; 92% 
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Figure 1 | The genomic landscape 
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concordant with 6.0 SNP mapping arrays; Supplementary Tables 3-6 
and Supplementary Figs 4-7). In all but five tumours with the highest 


mutation rates, >50% of SNVs were C-—»T/G—A transitions 
(Supplementary Fig. 8). The mean missense:silent mutation ratio 
was 3.6:1 and 40% of all missense mutations were predicted to be 
deleterious, suggesting a selective pressure for SNVs that affect protein 
coding (Supplementary Table 5). Global patterns of total SNVs 
and amplifications varied significantly among medulloblastoma 
subgroups, even when corrected for age and sex, supporting the notion 
that these tumours are distinct pathological entities (Fig. 1 and 
Supplementary Fig. 6). Custom capture-based analysis of the allele 
frequency of all somatic mutations in 12 medulloblastomas allowed 
us to predict the ancestry of certain genetic alterations, suggesting 
that aneuploidy precedes widespread sequence mutation in medullo- 
blastomas with highly mutated genomes (Supplementary Figs 9-11). 


Novel CNVs and SVs are rare in medulloblastoma 
The repertoire of focally amplified or deleted genes seems to be very 
limited in medulloblastoma. We detected expected’ gains of MYC, 
MYCN and OTX2 in subgroups 3 and 4, but no novel recurrent 
amplifications (Fig. 1, Supplementary Fig. 12 and Supplementary 
Table 7). In keeping with recent reports’, high-level amplification of 
MYCN in subgroup-3 sample no. 16 (sample numbering as in Fig. 1) was 
generated by chromothripsis; although chromothripsis was observed 
infrequently (n = 2/37 of the discovery cohort; Supplementary Fig. 13). 

Focal homo- or heterozygous deletions of genes previously impli- 
cated in medulloblastoma were also detected (for example, PTCH1, 
PTEN; Fig. 1)'*"! but novel recurrent focal deletions were rare. Three 
subgroup-4 tumours (nos 11-13) and one unclassified tumour 
deleted DDX31, AK8 and TSC1 at chromosome 9q34.14 in concert 
with OTX2 amplification, suggesting that these alterations are coop- 
erative (P<0.0005, Fisher’s exact test). The breakpoint in this 
deletion occurs in DDX31, and two samples contained a missense 
mutation (subgroup 4, no. 15) and complex rearrangement 
(unidentified case SJMB026) in this gene, suggesting that DDX31 is 
the target of these alterations (Supplementary Fig. 14). 

Over 50% of SVs detected by WGS broke the coding region of at 
least one gene, but less than 2% (n = 6/314, excluding two tumours 
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with excessive SVs) encode potential in-frame fusion proteins 
(Supplementary Fig. 15); none affect the same gene or signal pathway. 
Therefore, fusion proteins are likely to be an uncommon transform- 
ing mechanism in medulloblastoma. 

Although germline mutations in TP53, PTCH1, APC and CREBBP 
predispose to medulloblastoma'’"*, only 23 mutations previously 
associated with cancer were detected in discovery cohort germ lines. 
Only one of these—in a known case of Turcot’s syndrome—was 
accompanied by a somatic mutation (germline APC Y935*/somatic 
deletion; WNT subgroup no. 11; Supplementary Table 8). Thus, 
inherited forms of medulloblastoma seem to be rare in our cohort. 


Novel mutations in medulloblastoma subgroups 

Because SVs and CNVs are unlikely to drive most medulloblastomas, 
we investigated whether recurrent (more than two samples) somatic 
SNVs and/or indels might target discrete genes and pathways. This 
analysis identified 49 genes, across all 93 tumours, which were tar- 
geted by non-silent, recurrent, somatic mutations; 84% (n = 41/49) 
have not yet been implicated in medulloblastoma (Supplementary 
Tables 9 and 10). Several of these congregated in disease subgroups 
and converged on specific cell pathways (Fig. 1, Supplementary Fig. 8 
and Supplementary Table 11). 


Histone methylation is deregulated in subgroups 3 and 4 
The H3K27 trimethyl mark (H3K27me3) represses lineage-specific genes 
in stem cells'* (Supplementary Fig. 8). H3K27me3 is written by the 
polycomb repressive complex 2 (PRC2) that includes the methylase 
EZH2 (refs 16, 17) and is erased during differentiation by the demethylase 
KDM6A™. As H3K27me3 is erased, chromatin remodellers recruited to 
H3K4me3 promote differentiation, for example, CHD7 (refs 19, 20). This 
process is tightly controlled during development and deregulated in 
cancers; EZH2 is mutated in lymphomas” and upregulated in breast” 
and prostate** cancer, while biallelic inactivation of KDM6A (chro- 
mosome Xp11.2) or KDM6A and its paralogue UTY (chromosome 
Yq11), occurs in adult female and male cancers, respectively™. 
Hypergeometric distribution analyses revealed selective muta- 
tion of histone modifiers in subgroup-3 and -4 medulloblastomas 
(Supplementary Table 11). Six subgroup-4, one subgroup-3, and 
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one unclassified medulloblastoma contained novel inactivating muta- 
tions in KDMOA (Figs 1 and 2 and Supplementary Figs 8 and 16). The 
single female with a KDM6A splice-site mutation showed a deletion of 
the second allele that escapes X inactivation” (subgroup 4, no. 15), and 
57% (n= 4/7) of KDM6A-mutant male medulloblastomas deleted 
chromosome Y, compared with only 6% (m=3/51) of male, 
KDMG6A wild-type tumours (P< 0.005, Fisher’s exact test; Fig. 1). 
Thus, a two-hit model of KDM6A-UTY tumour suppression seems 
to operate in subgroup-4 medulloblastomas. Notably, mutations in 
six other KDM family members (KDM1A, KDM3A, KDMA4C, 
KDM5A, KDM5B and KDM7A) were detected exclusively in 
subgroup-3 and -4 tumours, implicating broad disruption of lysine 
demethylation in these medulloblastomas (Fig. 1, Supplementary 
Table 11 and Supplementary Fig. 16). 

Subgroup-3 and -4 medulloblastomas also gained and overexpressed 
EZH2 (chromosome 7q35-34), which writes H3K27me3, and 
contained novel inactivating mutations in effectors and regulators of 
the H3K4me3 mark” (Fig. 2a and Supplementary Fig. 8). Gain of 
chromosome 7q was significantly enriched among subgroup-3 and 
-4 medulloblastomas (P < 0.005, Fisher’s exact test) and correlated 
directly with EZH2 expression. Indeed, EZH2 was the eighth most 
significantly overexpressed gene on chromosome 7 among subgroup-3 
and -4 medulloblastomas that gained chromosome 74 relative to those 
with diploid chromosome 7 (P<0.005, Bonferroni correction). 
Nonsense and frameshift mutations were detected in CHD7 in four 
subgroup-3 and -4 tumours. ZMYM3 (chromosome Xq13.1), which 
participates in a protein complex with KDMI1A to regulate gene 
expression at the H3K4me3 mark”, was targeted by novel frameshift, 
nonsense and missense mutations in three male subgroup-4 medullo- 
blastomas. All three tumours with mutations in ZMYM3 also mutated 
KDM6A (subgroup 4, nos 19, 20) or KDM1A (subgroup 4, no. 21), 
suggesting that these alterations are cooperative. Remarkably, 
KDM6A, CHD7 and ZMYM3 mutations were confined to subgroups 
3 and 4, and clustered in samples with sub-median EZH2 expression 
levels (Fig. 2a; P< 0.05, Fisher’s exact test). These data suggest that 
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Figure 2 | Deregulation of H3K27me3 in subgroup-3 and -4 human and 
mouse medulloblastoma. a, Top row, SNP profiles of chromosome 7 (Chr 7) 
copy number in medulloblastomas (samples as Fig. 1; asterisk indicates subgroup- 
3 cases). Second row, expression of EZH2. Subgroup-3 and -4 tumours are 
ordered left to right by expression level, dagger indicates median expression point 
(Bonferroni-corrected P value of EZH2 expression versus chromosome 7 gain). 
Third row, mutation status of KDM6A, CHD7 and ZMYM3 (P value, Fisher’s 
exact test mutations versus EZH2 expression). Fourth row, H3K27me3 
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subgroup-3 and -4 medulloblastomas retain a stem-like epigenetic 
state by aberrantly writing (EZH2 upregulation) or preserving 
(KDM6A-UTY inactivation) H3K27me3, or disrupting H3K4me3 
associated transcription (CHD7 and ZMYM3 inactivation). Indeed, 
human and mouse subgroup-3 and -4 medulloblastomas contained 
significantly more H3K27me3 than did WNT- or SHH-subgroup 
tumours (Fig. 2b). Thus, gain of EZH2 and loss of KDM6A probably 
maintains H3K27me3 in subgroup-3 and -4 medulloblastomas. 

Finally, we looked to see if the differential expression of H3K27me3 
among medulloblastoma subgroups reflects ancestral chromatin 
marking in the progenitors that generate these tumours (Fig. 2b). 
Relatively low levels of H3K27me3 were detected in LRLPs and com- 
mitted GNPs, which generate WNT- and SHH-subgroup medullo- 
blastomas, respectively**, potentially explaining why mutations that 
preserve this epigenetic mark are absent from these tumours. We 
recently showed that subgroup-3 medulloblastomas arise from a rare 
fraction of cerebellar progenitors’. We are currently investigating 
whether these progenitors are found among the H3K27me3-positive 
cells seen in the external germinal layer (Fig. 2b). 


Novel mutations in WNT-subgroup medulloblastomas 

WNT-subgroup medulloblastomas contained mutations in epigenetic 
regulators that are different to those seen in subgroup-3 and -4 disease. 
CTNNBI, the principal effector of the WNT pathway, forms a 
transcription factor with the T-cell factor/lymphoid enhancer factor 
(TCE/LEF)**. The carboxy terminus of CTNNB1 then recruits a series 
of protein complexes that remodel chromatin and promote transcrip- 
tion at WNT-responsive genes (Supplementary Fig. 8). These include: 
histone acetyltransferases (for example, CREBBP and TRRAP-TIP60 
complexes)**”’; ATPases of the SWI/SNF family (for example, 
SMARCA4)*; and the mediator complex that coordinates RNA 
polymerase II placement (for example, MED13)*". As expected, 
>70% (n= 8/11) of WNT-subgroup medulloblastomas contained 
mutations that stabilize CTINNB1 (Fig. 1 and Supplementary Fig. 8; 
P<0.0001, Fisher’s exact test)*”**. A single subgroup-3 case (no. 5) 
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immunohistochemistry (numbers indicate colorimetry, P value ANOVA). GP4-7 
indicates case subgroup-4, no.7. a.u., arbitrary units. N/A, not available. 

b, H3K27me3 expression in mouse Blbp-Cre;Ctnnb1‘/"°*, sTp5 ah (WNT- 
subgroup), Ptch1’”;Tp53 ’~ (SHH-subgroup) and Myc;Ink4c ‘~ (subgroup-3) 
medulloblastomas (right) and developing hindbrain (left). High-power views of 
E14.5 LRL (i) and upper rhombic lip (URL) (ii). EGL, external germinal layer; 
IGL, internal granule layer. Scale bar, 50 jum. White arrows in P7 cerebellum (CB) 
pinpoint H3K27me3 cells in the EGL. 
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also showed a mutation in CTNNB1, but this mutation has not 
been reported in cancer, did not upregulate nuclear CTNNB1 
(Fig. 1) and is of unclear relevance. Remarkably, six WNT-subgroup 
medulloblastomas showed mutations in chromatin modifiers that 
are recruited to TCF/LEF WNT-responsive genes by CINNB1 
(Fig. 1 and Supplementary Fig. 8). Four WNT-subgroup tumours 
contained heterozygous missense mutations in the helicase domain of 
SMARCA4 (P < 0.002, Fisher’s exact test), two samples, including one 
with a SMARCA4 mutation (no. 5), contained nonsense mutations in 
CREBBP (WNT-subgroup enrichment, P< 0.02, Fisher’s exact test), 
and missense mutations in TRRAP and MED13 were detected in a single 
WNT-subgroup medulloblastoma each. Thus, in addition to stabiliza- 
tion of CTNNB1, the development of WNT-subgroup medulloblastoma 
may require disruption of chromatin remodelling at WNT-responsive 
genes. 

A small number of WNT-subgroup medulloblastomas lack muta- 
tions in CTNNBI or APC, suggesting that alternative mechanisms 
drive aberrant WNT signals in these tumours. Three WNT-subgroup 
medulloblastomas in our series contained wild-type CTNNB1 (nos 1, 
10 and 11; Fig. 1). Sample no. 11 inactivated APC as the sole case of 
Turcot’s syndrome in our study, but this tumour and sample no. 10 
also contained novel missense mutations in CDH1 (R63G, V329F; 
WNT-subgroup enrichment, P< 0.05, Fisher’s exact test; Fig. 1). 
CDH1 sequesters CTNNB1 at the cell membrane*, and mutations 
that disrupt this interaction promote WNT signalling in adult 
cancers***, The functional consequences of CDH1(R63G) and 
CDHI1(V329F) remain to be determined, but their restriction to 
WNT-subgroup tumours, mutual exclusivity with CTNNBI muta- 
tions, and adjacency to residues mutated in breast cancer (http:// 
www.sanger.ac.uk/genetics/CGP/cosmic/), suggest they might pro- 
mote aberrant WNT signals in medulloblastoma. 

We showed previously in mice that mutant Ctnnb] initiates WNT- 
subgroup medulloblastoma by arresting the migration of LRLPs from 
the embryonic dorsal brainstem to the pontine grey nucleus (PGN)°. 
Therefore, to test whether disruption of CDH1 might substitute for 
mutant CTNNBI in medulloblastoma, we used short hairpin (sh)RNAs 
to knockdown Cdh1 in embryonic day (E)14.5 mouse LRLPs (Fig. 3a-c). 
Deletion of Cdh1 expression upregulated Tcf/Lef-mediated gene tran- 
scription in LRLPs and more than doubled their self-renewal capacity 
(Fig. 3b). Furthermore, in utero electroporation of LRLPs with Cdh1 
shRNAs impeded their migration from the dorsal brainstem to the 
PGN with an efficiency similar to that of mutant Ctnnb1 (Fig. 3d, e; 
see Supplementary Methods). These data support the hypothesis that 
CDH1 suppresses the formation of WNT-subgroup medulloblastoma 
by regulating WNT-signals in LRLPs. 

WNT-subgroup medulloblastomas were also enriched for novel, 
recurrent somatic missense mutations in the DEAD-box RNA helicase 
DDX3X at chromosome Xp11.3 (P < 0.0001, Fisher’s exact test; Fig. 1). 
DDX3xX regulates several critical cell processes including chromosome 
segregation’’, cell cycle progression**, gene transcription and trans- 
lation*’. Previously reported cancer-associated mutations in DDX3X 
disrupt the ATPase activity of the protein, but seven of eight muta- 
tions identified in our series clustered in the DEAD-box domain 
(Supplementary Information and Supplementary Fig. 8). Structural 
modelling predicts that these mutations interfere with nucleic acid 
binding, possibly altering specificity and/or affinity for RNA substrates, 
rather than inactivating DDX3X (Supplementary Figs 17-22). Indeed, 
the wild-type allele of DDX3X that escapes X inactivation”* was retained 
by two of three DDX3X-mutant female medulloblastomas, and knock- 
down of Ddx3x halved the self-renewal rate of mouse LRLPs, suggest- 
ing that this protein is important for the proliferation and/or 
maintenance of the LRLP lineage (Fig. 3b). 

To understand better the role of DDX3X in WNT-subgroup 
medulloblastoma, we used our in utero migration assay to assess 
the impact of Ddx3x shRNAs, mutant Ddx3x'?7™™ (identified in 
WNT-subgroup sample no. 9), or mutant Ddx3x°??" (WNT sample 
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Figure 3 | Genes mutated in WNT-subgroup medulloblastomas regulate 
LRLPs. a, b, Isolated Olig3"/ Wntl* LRLPs were transduced in b with mutant 
Ctnnb1 (above hashed line) or the indicated shRNA-RFP (red fluorescence 
protein) construct (below hashed line). LRLPs were also transduced (+) or not 
(—) with a Tcf/Lef-enhanced green fluorescence (Tcf) reporter. Numbers on 
right show clonal percentage 2’ to 3’ passage neurosphere formation 

(= standard deviation (s.d.)). N/A, not applicable. Scale bar, 10 um. 

c, Knockdown of genes targeted by shRNA relative to control transduced cells. 
Data show mean = s.d. d, Immunofluorescence of P1 mouse hindbrains 
electroporated in utero at E14.5 with GFP (to control for equivalence of 
electroporation between embryos control) and the indicated construct. High- 
power views of indicated areas are shown right. Cells targeted by Ddx3x shRNA 
are present 48 h after electroporation but ablated by P1. Scale bars, 200 um. 

e, Heatmap showing the distribution of GFP*/RFP* cells in eletroporated mice 
at P1. Median distance migrated by cells and P values of migration distance and 
cell number relative to controls is shown. ****P < 0.00005; ***P < 0.0005; ** 
P<0.005; *P < 0.05. Red and green text reports significant increase or 
decrease, respectively, relative to control. 


no. 8) on LRLPs. Remarkably, although Ddx3x shRNAs were expressed 
abundantly in E14.5 brainstem cells within 48h of electropora- 
tion, =0.5% of Ddx3x-shRNA-positive cells were present by postnatal 
day (P)1, confirming the critical importance of this gene to maintain 
the LRLP lineage (Fig. 3d, e). In contrast, mice electroporated with 
either mutant Ddx3x'?”™ or Ddx3x°°** consistently contained 
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~50% more labelled cells at P1 than did controls, although these cells 
migrated normally (Fig. 3d, e and data not shown). Thus, mutations in 
DDX3X may contribute to WNT-subgroup medulloblastoma by 
increasing LRLP proliferation rather than perturbing the migration 
of their daughter cells. Notably, comparable knockdown in utero of 
MIl2, Gabrg1 and Kdmé6a that were selectively mutated in non-WNT 
medulloblastomas had no apparent impact on LRLPs; supporting the 
value of our assay for assessing WNT-subgroup specific mutations and 
underscoring the importance of cell context for functional studies of 
genes mutated in cancer subgroups. 


PIK3CA mutations promote WNT-subgroup medulloblastoma 
Cancer-associated, activating mutations in PIK3CA were detected in 
a single case each of WNT-subgroup (PIK3CA(Q546K)), SHH- 
subgroup (PIK3CA(H1047R)) and subgroup-4 (PIK3CA(N345K)) 
medulloblastoma (Fig. 1 and Supplementary Fig. 23). Although 
PIK3CA mutations are common in adult cancers* and reported in 
medulloblastoma”, their role in tumorigenesis remains controversial. 
In particular it is not known if these mutations initiate or progress 
cancer. To test this, we generated mice that express a conditional 
allele of the Pik3ca’°*?* mutation. Mice harbouring Pik3ca’*?* 
or Pik3ca"** and Tp53™ were bred with Blbp-Cre, which drives 
efficient recombination in LRLPs?. Blbp-Cre;Pik3ca®°?* mice, with 
or without Tp53™, survived tumour free for a median of 212 
days with no evidence of aberrant LRLP migration (Fig. 4a and 
data not shown). In stark contrast, 100% (n=11/11) of Blbp- 
CresCtnnb1*/°*®);Tp53*";Pik3ca°?* mice developed WNT- 
subgroup medulloblastomas by 3 months of age; only 4% (n = 2/54) 
of Blbp-Cre;Ctnnb1*"");Tp53*™ mice develop WNT-subgroup 
medulloblastoma by 11 months (Fig. 4a, b). Pik3ca wild-type and 
mutant mouse medulloblastomas displayed similar ‘classic’ histologies 
and nuclear Ctnnb1*, but Pik3ca’°*?* mutant tumours contained 
greater AKT pathway activity as measured by pS6 and p4EBP1 
immunostaining. Thus mutations in PIK3CA probably activate 
the AKT pathway to progress, rather than initiate, WNT-subgroup 
medulloblastoma. 


SHH-subgroup medulloblastomas 
Four of thirteen SHH-subgroup medulloblastomas contained 
expected biallelic inactivating alterations in SUFU or PTCH1. What 
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Figure 4 | Pik3ca"*** accelerates but does not initiate WNT-subgroup 
medulloblastoma. a, Tumour-free survival of mice of the indicated genotype. 
All mice carry the Blbp-cre allele. Log rank P< 0.0001. b, Haematoxylin and 
eosin (H&E) and immunohistochemical stains of indicated tumours. 
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drives aberrant SHH signals in the remaining cases remains unclear. 
These tumours contained mutations in MLL2, TP53 and PTEN that 
have been reported previously in medulloblastoma”; but these muta- 
tions occur in other subgroups and are not known to activate SHH 
signals. Two SHH-subgroup tumours (nos 11 and 12) contained 
identical novel T48M mutations in the GABA, (y-aminobutyric acid, 
subtype A) receptor, y1, which is predicted to be deleterious (Fig. 1 
and Supplementary Table 9). Disruption of GABA, receptors can 
enhance neural stem cell proliferation’, suggesting that these muta- 
tions might deregulate the proliferation of GNPs that generate SHH- 
subgroup medulloblastomas. 


Discussion 


We have identified several, new, recurrent, somatic mutations in spe- 
cific subgroups of medulloblastoma. Alterations affecting EZH2, 
KDM6A, CHD7 and ZMYM3 seem to disrupt chromatin marking 
of genes in subgroup-3 and -4 tumours. Further epigenetic studies 
will be required to uncover the identity of these genes, but evidence 
suggests these may include OTX2, MYC and MYCN*™*. As amplifica- 
tion of these genes was detected almost exclusively in subgroup-3 and 
-4 tumours that lacked mutations in KDM6A, CHD7 or ZMYM3, it is 
tempting to speculate that these genetic alterations target common 
transforming pathways. A recent study detected recurrent mutations 
in three other chromatin remodellers in medulloblastoma*: 
SMARCA4, MLL2 and MLL3, but this study did not include details 
of tumour subgroup. Here, we show that mutations in SMARCA4, 
CREBBP, TRRAP and MED13 are enriched in WNT-subgroup 
medulloblastomas; thereby uncovering potential cooperative muta- 
tions in chromatin remodellers and their binding-partner oncogene, 
CTNNBI. Thus, disruptions in the epigenetic machinery of medullo- 
blastoma are likely to be subgroup specific and may cooperate with 
other oncogenic mutations. The low incidence of MLL2 mutations 
detected in our study relative to previous work” probably reflects 
differences in study populations (see Supplementary Results). 

Although medulloblastoma is more prevalent in males, especially 
with subgroup-3 and -4 disease**, the reason for this sex bias is 
unknown. One potential explanation is the location of medulloblastoma 
oncogenes or tumour suppressor genes on chromosome X”. Three of 
the most recurrently mutated genes detected in our study are located on 
chromosome X, of which two (ZMYM3 and KDM6A) were observed 
almost exclusively in males. Mutation in these genes might explain 
some of the male sex bias in medulloblastoma. The third mutated X 
chromosome gene, DDX3X, is more likely to be a WNT-subgroup 
medulloblastoma oncogene. Three of four female medulloblastomas 
carried heterozygous mutations in DDX3X that escape X inactivation”, 
and our functional data indicate that mutations in this gene provide a 
proliferative advantage to LRLPs that generate these tumours. 

Our findings also have important implications for drug develop- 
ment. Inhibitors of the epigenetic machinery, especially those that 
maintain H3K27me3—for example, EZH2 methylase—may be useful 
treatments for subgroup-3 and -4 disease. These tumours include the 
most aggressive forms of medulloblastoma, for which treatment 
options are limited. Mutations that activate PIK3CA and DDX3X in 
WNT-subgroup tumours might also be targeted with novel therapeutic 
strategies***’. Future clinical trials of drugs that target these mutant 
proteins must recruit the appropriate patient populations, as we 
demonstrate that mutations show subgroup specificity in medullo- 
blastoma. Our accurate mouse models of WNT-subgroup, SHH- 
subgroup and subgroup-3 medulloblastoma should help with future 
studies of the biological and therapeutic importance of the novel 
genetic alterations described in this study. 


METHODS SUMMARY 


Human tumour and matched blood samples were obtained with informed 
consent through an institutional review board approved protocol at St Jude 
Children’s Research Hospital. WGS and analysis of WGS data were performed 
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as previously described”°. Details of sequence coverage, custom capture and other 
validation procedures are provided in Supplementary Information (Supplemen- 
tary Tables 12-15). Immunohistochemistry and immunofluorescence of 
human and mouse tissues were performed using routine techniques and primary 
antibodies of the appropriate tissues as described (Supplementary Methods). 
Medulloblastoma mRNA and DNA profiles were generated using Affymetrix 
U133v2 and SNP 6.0 arrays, respectively (Supplementary Methods). Real-time 
PCR with reverse transcriptase (RT-PCR) analysis of genes targeted in mouse 
LRLPs by shRNAs were performed as described previously**. LRLPs were isolated 
and transduced with indicated lentiviruses in stem cell cultures or targeted 
in utero with shRNAs or mutant cDNA sequences by electroporation as 
described’ (Supplementary Information). Mice harbouring a Cre-inducible 
Pik3ca®°** allele were generated using homologous recombination: a lox- 
puro-STOP-lox cassette was introduced immediately upstream of the exon con- 
taining the initiation codon, exon 9 was replaced with an exon containing the 
E545K mutation. Pik3ca®*"* mice were bred with Blbp-Cre;Ctnnb1 lox(Ex3/lox(Ex3) 
and Tps3 mice to generate progeny of the appropriate genotype and sub- 
jected to clinical surveillance. 
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Subgroup-specific structural variation 
across 1,000 medulloblastoma genomes 
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Medulloblastoma, the most common malignant paediatric brain tumour, is currently treated with nonspecific cytotoxic 
therapies including surgery, whole-brain radiation, and aggressive chemotherapy. As medulloblastoma exhibits marked 
intertumoural heterogeneity, with at least four distinct molecular variants, previous attempts to identify targets for 
therapy have been underpowered because of small samples sizes. Here we report somatic copy number aberrations 
(SCNAs) in 1,087 unique medulloblastomas. SCNAs are common in medulloblastoma, and are predominantly 
subgroup-enriched. The most common region of focal copy number gain is a tandem duplication of SNCAIP, a gene 
associated with Parkinson’s disease, which is exquisitely restricted to Group 4a. Recurrent translocations of PVT1, 
including PVTI-MYC and PVTI-NDRGI, that arise through chromothripsis are restricted to Group 3. Numerous 
targetable SCNAs, including recurrent events targeting TGF-f signalling in Group 3, and NF-«B signalling in Group 4, 


suggest future avenues for rational, targeted therapy. 


Brain tumours are the most common cause of childhood oncological 
death, and medulloblastoma is the most common malignant paediatric 
brain tumour. Current medulloblastoma therapy including surgical 
resection, whole-brain and spinal cord radiation, and aggressive 
chemotherapy supplemented by bone marrow transplant yields five- 
year survival rates of 60-70%". Survivors are often left with significant 
neurological, intellectual and physical disabilities secondary to the 
effects of these nonspecific cytotoxic therapies on the developing brain’. 

Recent evidence suggests that medulloblastoma actually comprises 
multiple molecularly distinct entities whose clinical and genetic dif- 
ferences may require separate therapeutic strategies*°. Four principal 
subgroups of medulloblastoma have been identified: WNT, SHH, 
Group 3 and Group 4 (ref. 7), and there is preliminary evidence for 
clinically significant subdivisions of the subgroups*”*. Rational, 
targeted therapies based on genetics are not currently in use for 
medulloblastoma, although inhibitors of the Sonic Hedgehog 
pathway protein Smoothened have shown early promise’. Actionable 
targets for WNT, Group 3 and Group 4 tumours have not been 
identified*"°. Sanger sequencing of 22 medulloblastoma exomes 
revealed on average only 8 single nucleotide variants (SNVs) per 
tumour'!. Some SNVs were subgroup-restricted (PTCH1, CTNNB1), 
whereas others occurred across subgroups (TP53, MLL2). We pro- 
posed that the observed intertumoural heterogeneity might have 
underpowered prior attempts to discover targets for rational therapy. 

The Medulloblastoma Advanced Genomics International 
Consortium (MAGIC) consisting of scientists and physicians from 
46 cities across the globe gathered more than 1,200 medulloblastomas 
which were studied by SNP arrays (n = 1,239; Fig. la, Supplementary 
Fig. 1 and Supplementary Tables 1-3). Medulloblastoma subgroup 
affiliation of 827 cases was determined using a custom nanoString- 
based RNA assay (Supplementary Fig. 2)'*. Disparate patterns of 
broad cytogenetic gain and loss were observed across the subgroups 
(Fig. 1b and Supplementary Figs 3, 7, 8, 10 and 11). Analysis of the 
entire cohort using GISTIC2 (ref. 13) to discover significant ‘driver’ 
events delineated 62 regions of recurrent SCNA (Fig. Ic, 
Supplementary Fig. 4 and Supplementary Tables 4 and 5); analysis 
by subgroup increased sensitivity such that 110 candidate ‘driver’ 
SCNAs were identified, most of which are subgroup-enriched 
(Fig. lc-e and Supplementary Table 6). 


Twenty-eight regions of recurrent high-level amplification (copy 
number = 5) were identified (Fig. 1d and Supplementary Table 7). 
The most prevalent amplifications affected members of the MYC 
family with MYCN predominantly amplified in SHH and Group 4, 
MYC in Group 3, and MYCL1 in SHH medulloblastomas. Multiple 
genes/regions were exclusively amplified in SHH, including GLI2, 
MYCL1, PPM1D, YAP1 and MDM4 (Fig. 1d). Recurrent homozygous 
deletions were exceedingly rare, with only 15 detected across 1,087 
tumours (Fig. le). Homozygous deletions targeting known tumour 
suppressors PTEN, PTCH1 and CDKN2A/B were the most common, 
all enriched in SHH cases (Fig. le and Supplementary Table 7). Novel 
homozygous deletions included KDM6A, a histone-lysine demethylase 
deleted in Group 4. A custom nanoString CodeSet was used to verify 24 
significant regions of gain across 192 MAGIC cases, resulting in a 
verification rate of 90.9% (Supplementary Fig. 5). We conclude that 
SCNAs in medulloblastoma are common, and are predominantly 
subgroup-enriched. 


Subgroup-specific SCNAs in medulloblastoma 

WNT medulloblastoma genomes are impoverished of recurrent focal 
regions of SCNA, exhibiting no significant regions of deletion and 
only a small subset of focal gains found at comparable frequencies in 
non-WNT tumours (Supplementary Figs 4, 6 and Supplementary 
Table 8). CINNBI mutational screening confirmed canonical exon 
3 mutations in 63 out of 71 (88.7%) WNT tumours, whereas monosomy 
6 was detected in 58 out of 76 (76.3%) (Supplementary Fig. 6; 
Supplementary Table 9). Four WNT tumours (4/71; 5.6%) had neither 
CTNNB1 mutation nor monosomy 6, but maintained typical WNT 
expression signatures. Given the size of our cohort and the resolution 
of the platform, we conclude that there are no frequent, targetable 
SCNAs for WNT medulloblastoma. 

SHH tumours exhibit multiple significant focal SCNAs (Fig. 2a, 
Supplementary Figs 12, 15, 16 and Supplementary Tables 10 and 11). 
SHH enriched/restricted SCNAs included amplification of GLI2 and 
deletion of PTCH1 (Fig. 2a, e, f)’°. MYCN and CCND2 were among 
the most frequently amplified genes in SHH (Supplementary 
Table 6), but were also altered in non-SHH cases. Genes upregulated 
in SHH tumours (that is, SHH signature genes) are significantly over- 
represented among the genes focally amplified in SHH tumours 
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Figure 1 | Genomic heterogeneity of medulloblastoma subgroups. a, The 


medulloblastoma genome classified by subgroup. b, Frequency and significance 
(Q value = 0.1) of broad cytogenetic events across medulloblastoma subgroups. 
c, Significant regions of focal SCNA identified by GISTIC2 in either pan-cohort 
or subgroup-specific analyses. d, e, Recurrent high-level amplifications 
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Figure 2 | Genomic alterations affect core signalling pathways in SHH 
medulloblastoma. a, GISTIC2 significance plot of amplifications (red) and 
deletions (blue) observed in SHH. The number of genes mapping to each 
significant region are included in brackets and regions enriched in SHH are 


shaded red. b, c, Recurrent amplifications of PPM1D (b) and PIK3C2B/MDM4 
(c) are restricted to SHH. d, Fluorescence in situ hybridization (FISH) 
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focal and broad (parentheses) SCNAs are listed. f, Mutual exclusivity analysis of 
focal SCNAs in SHH. g, Clinical implications of SCNAs affecting MYCN, GLI2 
or PTCH1 in SHH (log-rank tests). 
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(P = 0.001-0.02, permutation tests; Supplementary Fig. 9). Recurrent 
amplification of SHH signature genes has clinical implications, as 
amplification of downstream transcriptional targets could mediate 
resistance to upstream SHH pathway inhibitors". 

Novel, SHH-enriched SCNAs included components of TP53 sig- 
nalling, including amplifications of MDM4 and PPM1D, and focal 
deletions of TP53 (Fig. 2a—e). Targetable events, including amplifica- 
tions of IGF signalling genes IGFIR and IRS2, PI3K genes PIK3C2G 
and PIK3C2B, and deletion of PTEN were restricted to SHH tumours 
(Fig. 2a, c, e). Importantly, focal events affecting genes in the SHH 
pathway were largely mutually exclusive and prognostically signifi- 
cant (Fig. 2f, g). Many of the recurrent, targetable SCNAs identified in 
SHH medulloblastoma (IGFIR, KIT, MDM4, PDGFRA, PIK3C2G, 
PIK2C2B and PTEN) have already been targeted with small molecules 
for treatment of other malignancies, which might allow rapid trans- 
lation for targeted therapy of subsets of SHH patients (Supplementary 
Table 16). Novel SHH targets identified here are excellent candidates 
for combinatorial therapy with Smoothened inhibitors, to avoid the 
resistance encountered in both humans and mice”"*’. 

Group 3 and Group 4 medulloblastomas have generic names as 
comparatively little is known about their genetic basis, and no targets 
for rational therapy have been identified’. MYC amplicons are largely 
restricted to Group 3, whereas MYCN amplicons are seen in Group 4 
and SHH tumours (Fig. 1d)**. Indeed, MYC and MYCN loci comprise 
the most significant regions of amplification observed in Group 3 and 
Group 4, respectively (Fig. 3a, b, Supplementary Figs 13, 14, 17-20 
and Supplementary Tables 12-15). Group 3 MYC amplicons were 
mutually exclusive from those affecting the known medulloblastoma 
oncogene OTX2 (ref. 16) and were highly prognostic (Supplementary 
Fig. 21)*'°. Type II activin receptors, ACVR2A and ACVR2B and 
family member TGFBRI are highly amplified in Group 3 tumours, 
indicating deregulation of TGF-B signalling as a driver event in Group 
3 (Fig. 3c-e and Supplementary Fig. 22). The Group 3-enriched 
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medulloblastoma oncogene OTX2 is a prominent target of TGF-B 
signalling in the developing nervous system'” and TGF-8 pathway 
inhibitors CD109 (ref. 18), FKBP1A (refs 19 and 20) and SNX6 (ref. 
20) are recurrently deleted in Group 3 (Fig. 3a, d). SCNAs in TGF-B 
pathway genes were heavily enriched in Group 3 (P= 5.37 X 10 °, 
Fisher’s exact test) and found in at least 20.2% of cases, indicating that 
TGF-B signalling represents the first rational target for this poor 
prognosis subgroup (Fig. 3d). Similarly, novel deletions affecting 
regulators of the NF-«B pathway, including NFKBIA (ref. 21) and 
USP4 (ref. 22) were identified in Group 4 (Supplementary Fig. 23), 
proposing that NF-«B signalling may represent a rational Group 4 
therapeutic target. 

Network analysis of Group 3 and Group 4 SCNAs illustrates the 
different pathways over-represented in each subgroup. Only TGF-B 
signalling is unique to Group 3 (Fig. 3e). In contrast, cell-cycle control, 
chromatin modification and neuronal development are all Group 
4-enriched. Cumulatively, the dismal prognosis of Group 3 patients, 
the lack of published targets for rational therapy, and the prior 
targeting of TGF-B signalling in other diseases suggest that TGF-B 
may represent an appealing target for Group 3 rational therapies 
(Supplementary Table 16). 


SNCAIP tandem duplication is common in Group 4 


Although Group 4 is the most prevalent medulloblastoma subgroup, 
its pathogenesis remains poorly understood. The most frequent 
SCNA observed in Group 4 (33/317; 10.4%) is a recurrent region of 
single copy gain on chr5q23.2 targeting a single gene, SNCAIP 
(synuclein, alpha interacting protein) (Fig. 4a and Supplementary 
Fig. 24). SNCAIP, encodes synphilin-1, which binds to «-synuclein 
to promote the formation of Lewy bodies in the brains of patients with 
Parkinson’s disease****. Additionally, rare germline mutations of 
SNCAIP have been described in Parkinson’s families”’. Large insert, 
mate-pair, whole-genome sequencing (WGS) demonstrates that 
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Figure 3 | The genomic landscape of Group 3 and Group 4 
medulloblastoma. a, b, GISTIC2 plots depicting significant SCNAs in Group 3 
(a) and Group 4 (b) with subgroup-enriched regions shaded in yellow and 
green, respectively. c, Recurrent amplifications targeting type II (ACVR2A and 
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ACVR2B) and type I (TGFBR1) activin receptors in Group 3. d, Recurrent 
SCNAs affecting the TGF-B pathway in Group 3 (P = 5.73 X 10°, Fisher’s 
exact test). Frequencies of focal and broad (parentheses) SCNAs are listed. 

e, Enrichment map of gene sets affected by SCNAs in Group 3 versus Group 4. 
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Figure 4 | Tandem duplication of SNCAIP defines a novel subtype of Group 
4. a, Highly recurrent, focal, single-copy gain of SNCAIP in Group 4. b, Paired- 
end mapping verifies recurrent tandem duplication of SNCAIP in Group 4. 

c, Schematic representation of SNCAIP tandem duplication. d, SNCAIP is a 
Group 4 signature gene. Upper panel, SNCAIP expression across subgroups in a 
published series of 103 primary medulloblastomas. Error bars depict the 
minimum and maximum values, excluding outliers. Lower panel, SNCAIP 
ranks among the top 1% (rank, 39th out of 16,758) of highly expressed genes in 


SNCAIP copy number gains arise from tandem duplication of a 
truncated SNCAIP (lacking non-coding exon 1), inserted telomeric 
to the germline SNCAIP allele (Fig. 4b, c and Supplementary Fig. 25). 
Affymetrix SNP6 array profiling of patient-matched germline material 
confirmed that SNCAIP duplications are somatic (Supplementary Fig. 
26), and subsequent whole-transcriptome sequencing (RNA-Seq) of 
select Group 4 cases (n= 5) verified that SNCAIP is the only gene 
expressed in the duplicated region (Supplementary Fig. 27). Analysis 
of published copy number profiles for 3,131 primary tumours”’ and 
947 cancer cell lines” (total of 4,078 cases) revealed only four cases with 
apparent duplication of SNCAIP, all of which were inferred as Group 4 
medulloblastomas (data not shown). We conclude that SNCAIP duplica- 
tion is a somatic event highly specific to Group 4 medulloblastoma. 

Re-analysis of 499 published medulloblastoma expression profiles 
confirmed that SNCAIP is one of the most highly upregulated Group 4 
signature genes (Fig. 4d and Supplementary Fig. 28). Profiling of 188 
Group 4 tumours on expression microarrays followed by consensus 
non-negative matrix factorization (NMF) clustering delineates two 
subtypes of Group 4 (4a and 4; Fig. 4e and Supplementary Fig. 29). 
Strikingly, 21 out of22 SNCAIP duplicated cases belonged to Group 4% 
(P=3.12X 10 *, Fisher’s exact test). SNCAIP is more highly 
expressed in Group 4a than 48 (Fig. 4f), and 4a samples with tandem 
duplication showed approximately 1.5-fold increased expression, con- 
sistent with gene dosage (Fig. 4g and Supplementary Figs 35 and 36). 
Group 40 exhibits a relatively balanced genome compared to 4f 
(Supplementary Figs 30-32), and several 4% cases harbour SNCAIP 
duplication in conjunction with il7q and no other SCNAs (Sup- 
plementary Fig. 33). Importantly, SNCAIP duplications are mutually 
exclusive from other prominent SCNAs in Group 4, including MYCN 
and CDK6 amplifications (Supplementary Fig. 34). 


PVT1 fusions arise via chromothripsis in Group 3 


Although recurrent gene fusions have recently been discovered in solid 
tumours, none have been reported in medulloblastoma. RNA-Seq of 
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Group 4. e, NMF consensus clustering of 188 expression-profiled Group 4 
tumours supports two transcriptionally distinct subtypes designated 40 and 48 
(cophenetic coefficient = 0.9956). 21 out of 22 SNCAIP duplicated cases belong 
to Group 4% (P = 3.12 x 10°, Fisher’s exact test). f, SNCAIP expression is 
significantly elevated in Group 40 versus 4B (P = 9.31 X 10° 4, Mann- 
Whitney test). g, Group 4% cases harbouring SNCAIP duplication exhibit a 
~1.5-fold increase in SNCAIP expression. f, g, Error bars depict the minimum 
and maximum values, excluding outliers. 


Group 3 tumours (n = 13) identified two independent gene fusions in 
two different tumours (MB-182 and MB-586), both involving the 5’ 
end of PVT1, a non-coding gene frequently co-amplified with MYC in 
Group 3 (Fig. 5a, b, Supplementary Fig. 37 and Supplementary Tables 
17 and 18). Sanger sequencing confirmed a fusion transcript consisting 
of exons 1 and 3 of PVT] fused to the coding sequence of MYC (exons 2 
and 3) in MB-182, and a fusion involving PVT1 exon 1 fused to the 3’ 
end of NDRG1 in MB-586 (Fig. 5a, b). 

Group 3 copy number data at the MYC/PVT1 locus indicated that 
additional samples might harbour PVT1 gene fusions (Fig. 5c). PCR 
with reverse transcription (RT-PCR) profiling of select Group 3 cases 
confirmed PVTI-MYC fusions in at least 60% (12/20) of MYC- 
amplified cases (Fig. 5d and Supplementary Table 19). Fusion 
transcripts included many other portions of chr8q, with up to four 
different genomic loci mapping to a single transcript, a pattern remin- 
iscent of chromothripsis**”? (Fig. 5d). WGS performed on four MYC- 
amplified Group 3 tumours harbouring PVTI fusion transcripts 
identified a series of complex genomic rearrangements on chr8q 
(Fig. 5e, f, Supplementary Fig. 38 and Supplementary Tables 20 and 
21). Chromosome 8 copy number profile for MB-586 (PVTI- 
NDRG1) derived from WGS showed that PVT1 and NDRGI are 
structurally linked, as predicted by RNA-Segq, and several adjacent 
regions of 8q24 were extensively rearranged (Fig. 5e, f and Sup- 
plementary Table 21). Monte Carlo simulation suggests that this 
fragmented 8q amplicon arose through chromothripsis, a process of 
erroneous DNA repair following a single catastrophic event in which a 
chromosome is shattered into many pieces (Supplementary Fig. 39). 
Further examination of our copy number data set revealed rare 
examples of chromothripsis across subgroups (Supplementary Fig. 40), 
with only chr8 in Group 3 demonstrating statistically significant, region- 
specific chromothripsis (Q = 0.0004, false discovery rate (FDR)- 
corrected Fisher’s exact test). Among Group 3 tumours, the occurrence 
of chr8q chromothripsis is correlated with deletion of chr17p (location 
of TP53; data not shown), in keeping with the association of loss of 
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Figure 5 | Identification of frequent PVT1I-MYC fusion genes in Group 3. 
a, b, RNA-Seq identifies multiple fusion transcripts driven by PVT1 in Group 3. 
Schematics depict the structures of verified PVT1-MYC (a) and PVT1-NDRG1 
(b) fusion genes. c, Heat map of the MYC/PVT1 locus showing a subset of 13 
MYC-amplified Group 3 cases subsequently verified to exhibit PVT1 gene 


TP53 and chromothripsis recently described in medulloblastoma 
(P = 0.0199, Fisher’s exact test)**. Whereas the PVT! locus has been 
suggested to be a genomically fragile site, we observe that the majority 
of MYC-amplified Group 3 tumours harbour PVT] fusions that arise 
through a process consistent with chromothripsis. 

PVTI is a non-coding host gene for four microRNAs, miR-1204- 
miR-1207. Previous studies have implicated miR-1204 as a candidate 
oncogene that enhances oncogenesis in combination with MYC". 
PVTI fusions identified in this study involve only PVTI exon 1 and 
miR-1204. Importantly, miR-1204, but not the adjacent miR-1205 and 
miR-1206, is expressed at a higher level in PVT1-MYC fusion (+) 
Group 3 tumours compared to fusion (—) cases (P = 0.0008, Mann- 
Whitney test; Fig. 6a). To evaluate whether aberrant expression of 
miR-1204 contributes to the malignant phenotype, we inhibited miR- 
1204 in MED8A cells, a Group 3 medulloblastoma cell line with a 
confirmed PVTI-MYC fusion (Fig. 5d). Antagomir-mediated RNA 
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Figure 6 | Functional synergy between miR-1204 and MYC secondary to 
PVTI-MYC fusion. a, Quantitative RT-PCR of PVT1-encoded microRNAs 
confirms upregulation of miR-1204 in PVT1-MYC fusion (+) Group 3 tumours. 
MYC-balanced/fusion (—), n = 4; MYC-amplified/fusion (—), n = 6; MYC- 
amplified/fusion (+), n = 8. Error bars represent standard error of the mean 
(s.e.m.) and reflect variability among samples. b, c, Knockdown of miR-1204 
attenuates the proliferative capacity of PVT1-MYC fusion (+) MED8A 
medulloblastoma cells (b) but has no effect on fusion (—) ONS76 cells (c). Error 
bars represent the standard deviation (s.d.) of triplicate experiments. CTL, control. 


fusions (shown in d). Yellow box highlights the common breakpoint affecting 
the first exon/intron of PVT1, including miR-1204. d, Summary of PVT1 fusion 
transcripts identified in Group 3. e, f, WGS confirms complex patterns of 
rearrangement on chr8q24 in PVT1 fusion (+) Group 3. 


interference of miR-1204 had a pronounced effect on MED8A growth 
(Fig. 6b). A comparable reduction in proliferative capacity was 
achieved with knockdown of MYC. Conversely, the medulloblastoma 
cell line ONS76 exhibits neither MYC amplification nor a detectable 
PVT1-MYC fusion gene, and knockdown of miR-1204 had no effect in 
this line (Fig. 6c). 

PVT1 has been reported previously in fusion transcripts with a 
number of partners*°****. The most prevalent form of the PVT1- 
MYC fusion in Group 3 tumours lacks the first, non-coding exon of 
MYC, similar to forms of MYC that have been described in Burkitt’s 
lymphoma™ (Fig. 5a, d). The PVT1 promoter contains two non- 
canonical E-boxes and can be activated by MYC". This indicates a 
positive feedback model where MYC can reinforce its own expression 
from the PVTI promoter in PVT1-MYC fusion (+) tumours. Indeed, 
knockdown of MYC alone in MED8A cells resulted in diminished 
expression of both MYCand miR- 1204, suggesting MYC may positively 
regulate PVT (that is, miR-1204) expression in medulloblastoma cells 
(Supplementary Fig. 41). 


Discussion 


Medulloblastomas have few SNVs compared to many adult epithelial 
malignancies", whereas SCNAs seem to be quite common. 
Medulloblastoma is a heterogeneous disease’, thereby requiring large 
cohorts to detect subgroup-specific events. Through the accumula- 
tion of >1,200 medulloblastomas in MAGIC, we have identified 
novel and significant SCNAs. Many of the significant SCNAs are 
subgroup-restricted, highly supporting their role as driver events in 
their respective subgroups. 

Expression of synphilin-1 in neuronal cells results in decreased cell 
doubling time**, decreased caspase-3 activation*’, decreased TP53 
transcriptional activity and messenger RNA levels, and decreased 
apoptosis*’. Synphilin-1 is ubiquitinated by parkin, which is encoded 
by the hereditary Parkinson’s disease gene PARK2 (ref. 24), a candidate 
tumour suppressor gene**. Whereas patients with Parkinson’s disease 
have an overall decreased risk of cancer, they may have an increased 
incidence of brain tumours”. As tandem duplications of SNCAIP are 
highly recurrent, stereotypical, subgroup-restricted, affect only a single 
gene, and as SNCAIP-duplicated tumours have few if any other 
SCNAs, SNCAIP is a probable driver gene, and merits investigation 
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as a target for therapy of Group 4a. Similarly, PVT1 fusion genes are 
highly recurrent, restricted to Group 3, arise through a chromothripsis- 
like process, and are the first recurrent translocation reported in 
medulloblastoma. 

We identify a number of highly targetable, recurrent, subgroup- 
specific SCNAs that could form the basis for future clinical trials (that 
is, PI3K signalling in SHH, TGF-f signalling in Group 3, and NF-KB 
signalling in Group 4). Activation of these pathways through alterna- 
tive, currently unknown genetic and epigenetic events could increase 
the percentage of patients amenable to targeted therapy. We also 
identify a number of highly ‘druggable’ events that occur in a minority 
of cases. The cooperative, global approach of the MAGIC consortium 
has allowed us to overcome the barrier of intertumoural heterogeneity 
in an uncommon paediatric tumour, and to identify the relevant and 
targetable SCNAs for the affected children. 


METHODS SUMMARY 


All patient samples were obtained with consent as outlined by individual 
institutional review boards. Genomic DNA was prepared, processed and 
hybridized to Affymetrix SNP6 arrays according to manufacturer’s instructions. 
Raw copy number estimates were obtained in dChip, followed by CBS segmenta- 
tion in R. SCNAs were identified using GISTIC2 (ref. 13). Driver genes within 
SCNAs were inferred by integrating matched expressions, literature evidence and 
other data sets. Pathway enrichment of SCNAs was analysed with g:Profiler and 
visualized in Cytoscape using Enrichment Map. Fluorescence in situ hybridiza- 
tion (FISH) was performed as described previously*'®. Medulloblastoma 
subgroup was assigned using a custom nanoString CodeSet as described previ- 
ously’. Tandem duplication of SNCAIP was confirmed by paired-end mapping 
as previously reported**. RNA was extracted, processed and hybridized to 
Affymetrix Gene 1.1 ST Arrays as recommended by the manufacturer. 
Consensus NMF clustering was performed in GenePattern. Gene fusions were 
identified from RNA-Seq data using Trans-ABySS. Medulloblastoma cell lines 
were maintained as described'®. Proliferation assays were performed with the 
Promega CellTiter 96 Assay. Additional methods are detailed in full in 
Supplementary Methods. 
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Quantum nonlinear optics with single photons 
enabled by strongly interacting atoms 


Thibault Peyronel’, Ofer Firstenberg!?, Qi-Yu Liang’, Sebastian Hofferberth!?, Alexey V. Gorshkov’, Thomas Pohl’, 


Mikhail D. Lukin? & Vladan Vuletié 


The realization of strong nonlinear interactions between individual 
light quanta (photons) is a long-standing goal in optical science and 
engineering’”, being of both fundamental and technological signifi- 
cance. In conventional optical materials, the nonlinearity at light 
powers corresponding to single photons is negligibly weak. Here we 
demonstrate a medium that is nonlinear at the level of individual 
quanta, exhibiting strong absorption of photon pairs while remain- 
ing transparent to single photons. The quantum nonlinearity is 
obtained by coherently coupling slowly propagating photons*> to 
strongly interacting atomic Rydberg states*’* in a cold, dense 
atomic gas’*"*. Our approach paves the way for quantum-by- 
quantum control of light fields, including single-photon switch- 
ing”*, all-optical deterministic quantum logic’* and the realization 
of strongly correlated many-body states of light’’. 

Recently, remarkable advances have been made towards optical 
systems that are nonlinear at the level of individual photons. The most 
promising approaches have used high-finesse optical cavities to 
enhance the atom-photon interaction probability*’*~’. In contrast, 
our present method is cavity-free and is based on mapping photons 
onto atomic states with strong interactions in an extended atomic 
ensemble’*!***”*, The central idea is illustrated in Fig. 1, where a 
quantum probe field incident onto a cold atomic gas is coupled to 
high-lying atomic states (Rydberg levels**) by means of a second, 
stronger laser field (control field). For a single incident probe photon, 
the control field induces a spectral transparency window in the 
otherwise opaque medium via electromagnetically induced trans- 
parency (EIT°), and the probe photon travels at much reduced speed 
in the form of a coupled excitation of light and matter (a Rydberg 
polariton). However, in stark contrast to conventional EIT, if two 
probe photons are incident onto the Rydberg medium, the strong 
interaction between two Rydberg atoms tunes the transition out of 
resonance, thereby destroying the transparency and leading to absorp- 
tion’®**?*7526, The experimental demonstration of an optical material 
exhibiting strong two-photon attenuation in combination with single- 
photon transmission is the central result of this work. 

The quantum nonlinearity can be viewed as a photon-photon 
blockade mechanism that prevents the transmission of any multi- 
photon state. It arises from the Rydberg excitation blockade’’, which 
precludes the simultaneous excitation of two Rydberg atoms that are 
separated by less than a blockade radius, rn, (Fig. 1). During the optical 
excitation under EIT conditions, an incident single photon is con- 
verted into a Rydberg polariton inside the medium. However, owing 
to the Rydberg blockade, a second polariton cannot travel within a 
blockade radius from the first one, and EIT is destroyed. Accordingly, 
if the second photon approaches the single Rydberg polariton, it will 
be significantly attenuated, provided that 7, exceeds the resonant 
attenuation length of the medium in the absence of EIT, 
l=(Noa) ‘where N is the peak atomic density and o, the absorp- 
tion cross-section. This simple physical picture implies that, in the 


regime where the blockade radius exceeds the absorption length 
(r, 21), two photons in a tightly focused beam not only cannot pass 
through each other'’, but also cannot propagate close to each other 
inside the medium (see Fig. 1c and the detailed theoretical analysis 
below). Using Rydberg states with principal quantum numbers 
46 =n = 100, we can realize blockade radii n, between 3 um and 13 
jun, while for our highest atomic densities of N =2~x 10!2 cm~3, the 
attenuation length /, is below 2 um. The optical medium then acts as a 
quantum nonlinear absorption filter, converting incident laser light 
into a non-classical train of single-photon pulses. 

EIT nonlinearities at the few-photon level have been previously 
observed without using strongly interacting atomic states by means 
of strong transverse confinement of the light**”. The interactions 
between cold Rydberg atoms have been explored in ensembles*”®, 
and have been used to realize quantum logic gates between two 
Rydberg atoms’, Enhanced optical nonlinearities using Rydberg 
EIT'*”*** have been observed in pioneering work'*™ that we are build- 
ing on. Very recently, the Rydberg blockade in a dense, mesoscopic 
atomic ensemble has been used to implement a deterministic single- 
photon source”. 

To observe the photon-photon blockade, several key requirements 
must be fulfilled. First, to eliminate Doppler broadening, the atoms 
should be cold so that they move by less than an optical wavelength 
during the microsecond lifetime of the EIT coherence. Second, the 
atomic cloud should be sufficiently dense such that the blockade con- 
dition nr, 2 /, is fulfilled. Last, the system should be one-dimensional, 
that is, the transverse size of the probe beam should be smaller than the 
blockade radius 7, in order to prevent polaritons from travelling side 
by side. 

We fulfil these conditions by trapping a dense laser-cooled atomic 
ensemble and focusing the probe beam to a Gaussian waist w = 4.5 um 
< ry (see Methods). We prepare an *’Rb ensemble containing up to 
10° atoms in a far-detuned optical dipole trap produced by a Nd:YAG 
laser operating at 1,064nm with a total power of 5 W. The trap is 
formed by two orthogonally polarized beams with waists w,= 50 
um intersecting at an angle of 32°. The atoms are optically pumped 
into the hyperfine (F) and magnetic (mp) sublevel |g) = |5S1/2, F = 2, 
Mp = 2) in the presence of a 3.6 G magnetic field along the quantiza- 
tion axis, which is defined by the common propagation direction of the 
probe and control beams along the long axis of the cloud. The probe 
beam on the |g) — |e) =|5P3/2, F= 3, mp =3) transition and the 
control beam on the |e) > |r) = |nS,/2, J= 1/2, m; = 1/2) transition 
with waist w.=12.5um are oppositely circularly polarized. (Here 
J and m, denote the quantum numbers for total angular momentum 
and its component along the quantization axis, respectively.) The 
resonant optical depth (OD) of the cloud can be as large as 
OD = 50, with in-trap radial and axial r.m.s. cloud dimensions of 
6, = 10 um and o, = 36 um, respectively. To avoid inhomogeneous 
light-shift broadening of the two-photon transition, we turn off the 
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Figure 1 | Rydberg-blockade-mediated interaction between slow photons. 
a, b, An elongated ensemble of laser-cooled rubidium atoms is prepared in a 
crossed optical-dipole trap. Co-propagating control and probe fields couple the 
ground-state |g) to a high-lying Rydberg state |r) via a short-lived excited state 
|e). Under EIT conditions, the probe photons slowly propagate in the medium 
as Rydberg polaritons. The Rydberg—Rydberg interaction of strength C, 
between atoms at a distance r, V(r) = hC,/r°, shifts the Rydberg levels out of 
resonance and blocks simultaneous Rydberg excitations at close range. 
Consequently two Rydberg polaritons cannot both propagate when they are 
closer than the blockade radius r, = (2Cs/yrrr)”%, set by V(n,) = /yzrr/2, where 
Yer = Q? / I’ is the single-atom EIT linewidth as set by the control field Rabi 
frequency Q, and the decay rate I of state |e). c, d, Numerical simulations 
showing the spatial evolution of the probability distribution associated with two 
photons (c) and two Rydberg excitations (d) at positions (z,, z2) inside the 
medium, normalized by their values in the absence of blockade. Two Rydberg 
excitations are excluded from the blockaded range, resulting in the formation of 
an anti-bunching feature in the light field, whose width increases during the 
propagation due to the finite EIT transparency width B=/zyy jf V80D. 


optical dipole trap before turning on the probe and control light, and 
probe the Rydberg EIT system continuously for up to a few hundred 
microseconds. The control light is filtered out from the transmitted 
light, and the photon-photon correlation function g(t) of the probe 
beam is measured versus the time separation t by means of two photon 
counters. The slow-light group delay ty through the atomic medium’ is 
measured independently in a pulsed experiment, and used to calculate 
the corresponding minimum group velocity, vg = V/27x f Td. 

Probe transmission spectra are presented in Fig. 2a for large optical 
depth OD = 40 and the control laser tuned to the Rydberg state 
|100S,/2). At very low incident photon rates R; <1 ts _', the spectrum 
displays an EIT transparency window with 60% transmission. The trans- 
mission is mainly limited by the finite EIT decoherence rate yg, which 
for our system is dominated by Doppler broadening and laser linewidth. 
The extraordinary nonlinearity of the Rydberg EIT medium’ becomes 
apparent as the incident photon rate is increased: the probe beam is 
already strongly attenuated at a photon rate of R;~ 4s‘. To demon- 
strate that we are operating in a quantum nonlinear regime, we show in 
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Figure 2 | Two-photon optical nonlinearity. a, Transmission versus probe 
detuning at various incoming photon rates (in us '): R; = 1, 2, 4, 6 (dashed 
green, solid red, dotted blue, and dot-dashed black, respectively) for | 100S,,2), 
EIT linewidth ygyy = 2m X 23 MHz, optical depth OD = 40, and measured 
group delay tq = 250 ns. The system is strongly nonlinear at a power as low as 
0.25 pW. b, Data points show photon-photon correlation function g(r) at 
EIT resonance for the same parameters as in a with R; = 1.2 s~'. The top axis 
shows the spatial separation v,t of polaritons with v, ~ 400m s_'. Error bars, 
1o statistical uncertainty. Spurious detection events set a lower bound on g of 
0.09(3) (red dotted line). Inset, g(a) for the less strongly interacting state 
|46S,/2) with similar parameters. The solid lines in the main panel and inset are 
theoretical calculations as described in the text, with the probe waist fixed at 
w = 6 tum. Values g” > 1 are attributed to classical fluctuations (see 
Supplementary Fig. 4 and Supplementary Information). 


Fig. 2b the correlation function g(t) of the transmitted probe light, 
measured at R,=1.2 ps '. For the most strongly interacting state 
|100S,/2) with ry = 13 um ~ 51, ~ 2.9w we observe strong antibunching 
with g(0) = 0.13(2), largely limited by background light. (Here 0.13(2) 
indicates 0.13 + 0.02.) Subtraction of the independently measured back- 
ground coincidence counts yields a corrected ge?) (0) =0.04(3). These 
observations are in sharp contrast to EIT transmission via a less strongly 
interacting Rydberg state |46S,/2) with r, = 3 jm, where the photon 
statistics of the transmitted light are similar to those of the incident 
coherent state (see Fig. 2b inset). We note that for | 100Sj/2) the photons 
are anti-bunched over a length scale v,t ~ 501m that exceeds the 
blockade radius (see top axis of Fig. 2b), indicating the influence of 
additional propagation effects beyond the simple picture outlined above. 

To investigate the transmission characteristics of multiple photons 
through the medium, we plot in Fig. 3a the output photon rate Ro, 
scaled by the EIT transmission measured at low probe power, as a 
function of incident photon rate Rj. At first, R, increases linearly with 
R; as expected, but then saturates abruptly to a constant value of 
R, = 1.3(3) pis’. Note that these observations deviate from the 
simplistic model of a multiphoton absorber that transmits only the 
one-photon component from the incoming coherent state (black dashed 
line in Fig. 3a). At the same time, the observed output flux corresponds to 
less than one photon in the medium (R>! >tq=300 ns). Figure 3b 
shows the saturated output rate versus the ratio n,/w of blockade 
radius and probe beam waist for a wide range of principal quantum 
numbers, control field intensities and optical depths. The approximate 
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Figure 3 | Saturation behaviour of the transmission. a, Outgoing versus 
incoming photon rate for |100S,/2), Yprr = 2m X 15 MHz, OD = 26, anda 
measured width t, = 130 ns of the anti-bunching feature in g(t). All output 
rates are scaled by the transmission of 50% at low photon rate due to linear 
absorption, and corrected for the finite detection-path efficiency. The dashed 
black curve outlines the expected rate if all multi-photon events in a time range 
T = 320ns are fully blocked, while the green dashed curve assumes that all 
multiphoton states within t = 800 ns are converted into an outgoing one- 
photon state. b, Saturated rate of outgoing photons R,t, per anti-bunching 
correlation time t., scaled by the linear absorption, as a function of the ratio 
between the blockade radius n, and the probe beam waist w. The Rydberg states 
are |100S,/2) (blue, w = 4.5 um, OD, ~ 8; pink, w = 4.5 pm, OD ~ 4), |77S1/2) 
(black, w = 4.5 um, OD, ~ 3), |46S,/2) (green, w = 4.5 um, OD, ~ 0.7; red, 
w=7 um, OD, ~ 0.7). t, is estimated using the single-atom EIT linewidths 
(squares, triangles, circles, diamonds) =2n x (6—16,18—26,29—36,50) MHz, 
and varies from 60 to 330 ns. The dashed line corresponds to 0.9(w/ Th) 
indicating the expected scaling with transverse confinement for w2 1p. 


R, x (w/n,)* scaling, valid for w 2 7, indicates that the saturated rate 
for intermediate to strong interactions, 7, 2 /,, is largely determined 
by the transverse geometrical constraint, that is, by the extent to 
which the Rydberg polaritons can propagate side by side through 
the medium. 

Two important features of the photon-photon blockade are the 
degree of two-photon suppression at equal times, g(0), and the 
associated correlation time, that is, the width t. of the antibunching 
feature in g (2). As discussed in detail below, the blockade mechanism 
is most effective if the optical depth per blockade radius, OD, = rp/I,; 
exceeds unity’, and if the system is effectively one-dimensional, 
Tp > w. Because the blockade radius” increases with the principal 
quantum number n as 7, x nthe combination of both effects 
results in a steep dependence of g(0) upon n. Figure 4a, b shows that 
g®(0) improves with the principal quantum number n of the Rydberg 
state and the interaction strength r,/1,, resulting in a more than tenfold 
suppression of the two-photon transmission, limited by independently 
measured background light on the photon detectors (dotted lines). At 
the same time, the observed width 1, of the g” feature considerably 
exceeds the photon travel time t, = r)/Vg ~ 50 ns through the blockade 
radius (Fig. 4c, d). Close examination (Fig. 4d) reveals that the 
correlation time is of the same order as, and scales inversely 
proportionally with, the spectral width’ B=yp;r / V80OD of the EIT 
transparency window. This observation suggests that propagation 
effects play an important role in establishing the g correlation time 
Tt. in a medium of large optical depth. We observe that, under 
appropriate conditions, two-photon events are suppressed inside the 
medium on a length scale that approaches the size o,, ~ 40 jim of 
the entire atomic ensemble, and on a timescale that approaches the 
intrinsic coherence time yz! =500 ns. 

To gain further understanding of these observations, we theoretically 
analyse the photon propagation dynamics in the weak-probe limit 
where the average number of photons inside the medium is much less 
than one. In this case, it suffices to consider two polaritons (Fig. 1b). The 
corresponding two-photon component of the state vector’ is 
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Figure 4 | Dependence of the correlation function on EIT parameters. 

a, b, Equal-time photon-photon correlation g(0) as a function of OD for 
|77S1)/2) (a) and |100S;,/) (b), for a set of single-atom EIT linewidths (circles, 
downtriangles, uptriangles, left-triangles) = 2m x (20,27,16,26) MHz. 

c, d, Width 1, of the anti-bunching feature in g(t) as a function of optical 
depth (c) and EIT transparency width (d) B=yz;r / Vv 80D, respectively. Solid 
lines in a—c are numerical solutions for a probe beam waist w = 6 um, including 
detection noise in a, b (dotted lines). The black dashed line in d is 1.05/B and 
derives from an approximate analytical solution of equation (1) (see 
Supplementary Information). Error bars, 1o statistical uncertainty. 


1 
Walt) = 5 dr dr2EE(r1,r2,t)é' (11)é" (12)|0), where ér) denotes 


the photon field operator, and |EE(r,, I; t)|? is the probability of finding 
two photons at locations rj, r2. This probability directly yields the 
spatially dependent photon-photon correlation function, and, via the 
group velocity v,, the corresponding temporal correlation function 
g(z). An intuitive picture emerges if we make the simplification of a 
tightly focused probe beam (one-dimensional approximation) 
travelling through a homogeneous medium with perfect linear EIT 
transmission. In this case, the steady-state two-photon amplitude in 
the medium obeys (see Supplementary Information): 
2 
- WO pea, Z) +4l, c +V(r) 7 6? EE(21, 22) (1) 
a 

where R = (z, + Z>)/2 and r= z, — Z, are the centre-of-mass and rela- 
tive coordinates of the two photons, respectively. The function 
V(r) =r / (rf —2ir°) can be regarded as an effective potential that 
describes the impact of Rydberg—Rydberg interactions’*. For large 
photon-photon distances, r>> 1, the potential V vanishes, and equa- 
tion (1) yields perfect transmission under EIT, while for distances r <n, 
the interaction V modifies the two-photon propagation. According to 
equation (1), photon correlations emerge from a combination of two 
processes: the first term acts inside the blockade radius n, and describes 
absorption with a coefficient /;-' as the interaction V tunes EIT out of 
resonance. This would create a sharp dip in the two-photon correlation 
function with a corresponding correlation time t, = rp/Vvg associated 
with the blockade radius. However, if the corresponding spectral width 
~t, | exceeds the spectral width B of the EIT transparency window’, 
the second diffusion-like term acts to broaden the absorption dip 
(Fig. 1c) in space and time, increasing the photon—photon correlation 
time t, beyond 1, towards a value set by the EIT transparency width 
(Fig. 4d). To maintain strong two-photon suppression (g(0) <l1)in 
the presence of EIT-induced diffusion, the loss term must exceed the 
diffusion on the length scale of the blockade radius, requiring n, > |. 
Large optical depth OD, = 1/l, of the blockaded region is therefore the 
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key experimental feature that allows us to extend the earlier studies 
into the quantum nonlinear regime. 

For direct comparisons with our experiments, we solve numerically 
the full set of propagation equations accounting for the Gaussian density 
profile of the trapped atomic cloud, the finite waist of the probe beam, and 
the imperfect single-photon transmission due to finite decoherence )/,, of 
the two-photon transition. As shown in Figs 2 and 4, the theory captures 
the essential features of our measured correlation functions and, 
moreover, reproduces their dependence on the Rydberg states, control 
laser intensities and optical depths of the sample over a wide range of 
parameters. This detailed theoretical understanding also allows us to 
analyse the prospects for possible future improvements. These include 
a reduction of Doppler broadening (through lower atomic temperature 
or the use of counter-propagating probe and control beams) to increase 
the linear transmission from 60% towards unity, the excitation of even 
higher-lying Rydberg states for larger blockade radius, and larger atomic 
density to further increase the optical depth per blockade radius OD, and 
overall optical depth OD. 

Our observations suggest intriguing prospects for ultimate quantum 
control of light quanta. For example, by storing a single photon in a 
Rydberg state and subsequently transmitting a second Rydberg 
polariton, a single-photon switch can be created"». It can be used, for 
example, for quantum non-demolition measurements of optical 
photons. At the same time, by using strong interactions in the dis- 
persive regime, the present approach can be used to implement deter- 
ministic quantum logic gates'*'®, which would constitute a major 
advance towards all-optical quantum information processing’. Last, 
our results may open the door to exploring the quantum dynamics of 
strongly interacting photonic many-body systems. For example, it 
may be possible to create a crystalline state of strongly interacting 
polaritons’’. Beyond these specific applications, our work demon- 
strates that unique quantum nonlinear optical materials can be created 
by combining slow-light propagation with strong atom-atom interac- 
tions, an approach which can be potentially extended to realize other 
material systems with quantum nonlinearities. 
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METHODS SUMMARY 


An ensemble of 6 X 10° laser-cooled atoms is captured in a magneto-optical trap 
(MOT) every 300 ms. The trapped cloud is compressed and loaded into the dipole 
trap by the combined actions of increasing the magnetic-field gradient to 
35Gcm ', detuning the MOT trapping frequency by —30 MHz and reducing 
the MOT repumper intensity to 10 pW cm *. The magnetic fields are then rapidly 
shut off, allowing for 10 ms of molasses cooling to a temperature of 35 [1K. The 
crossed dipole trap holds up to 10° atoms at a peak density of 2X 10’ atoms cm * 
and a measured optical depth of OD = 50. 

The probe beam is focused to a 1/e” waist w = 4.5 1m by a confocal arrangement 
of achromatic doublet lenses with focal length 30 mm and diameter 6.25 mm. The 
control field is co-propagating with the probe beam. The frequencies of both lasers 
are locked to an optical Fabry-Perot resonator that is stabilized against long-term 
drifts to a Doppler-free atomic resonance line. The measured short-term line- 
widths are 120 kHz and 80kHz for the probe and control laser, respectively. 
The transmitted control light is separated from the probe light by a combination 
of interference and absorption filters. 

The intensity correlation function of the outgoing probe field is measured with 
two single-photon detectors. Spurious detection events typically limit g,(t) to 
=0.1. These include dark counts from the detector, imperfect polarization of 
the probe photons (light with the orthogonal circular polarization is only weakly 
absorbed by the medium) and residual control light. 
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A Luttinger liquid is an interacting one-dimensional electronic 
system, quite distinct from the ‘conventional’ Fermi liquids 
formed by interacting electrons in two and three dimensions’. 
Some of the most striking properties of Luttinger liquids are 
revealed in the process of electron tunnelling. For example, as a 
function of the applied bias voltage or temperature, the tunnelling 
current exhibits a non-trivial power-law suppression””’. (There is 
no such suppression in a conventional Fermi liquid.) Here, using a 
carbon nanotube connected to resistive leads, we create a system 
that emulates tunnelling in a Luttinger liquid, by controlling the 
interaction of the tunnelling electron with its environment. We 
further replace a single tunnelling barrier with a double-barrier, 
resonant-level structure and investigate resonant tunnelling 
between Luttinger liquids. At low temperatures, we observe perfect 
transparency of the resonant level embedded in the interacting 
environment, and the width of the resonance tends to zero. We 
argue that this behaviour results from many-body physics of inter- 
acting electrons, and signals the presence of a quantum phase 
transition**. Given that many parameters, including the inter- 
action strength, can be precisely controlled in our samples, this is 
an attractive model system for studying quantum critical phenom- 
ena in general, with wide-reaching implications for understanding 
quantum phase transitions in more complex systems, such as cold 
atoms’ and strongly correlated bulk materials’. 

Unlike two- and three-dimensional Fermi liquids, a Luttinger liquid 
completely ‘dissolves’ individual electrons, replacing them with collective 
plasmon waves. When, in the process of quantum-mechanical tunnel- 
ling, an external electron is added to the Luttinger liquid, the plasmons 
spread the charge through the system, akin to the ripples from a raindrop 
on the surface of a pond. At zero temperature, the tunnelling electron 
does not have the necessary energy to excite the plasmons. Asa result, the 
tunnelling conductance between a normal metal and a Luttinger liquid, 
or between two Luttinger liquids, is suppressed at low temperature, with 
a power-law dependence on temperature””. 

Even more interesting is the case of resonant tunnelling, in which a 
single tunnel barrier between Luttinger liquids is replaced by a resonant 
level formed in a double-barrier quantum structure. Starting with the 
seminal work of ref. 8, this problem has received significant theoretical 
attention” '*. Perhaps the most spectacular prediction is the existence of 
resonance peaks with perfect conductance (full transparency), but 
vanishingly small width (infinite lifetime) at zero temperature®. These 
resonances require two Luttinger liquids that are symmetrically coupled 
to the resonant level. Several experiments have addressed resonant 
tunnelling in a Luttinger liquid in the low-temperature limit*’*"’, but 
with no attempt to control the tunnelling strength. In this work, we 
create a system analogous to a Luttinger liquid by properly designing 
electron interactions in the resonant level’s environment, and we tune 
the system to the symmetric coupling point. 

We realize both the single-barrier tunnelling and the double- 
barrier resonant tunnelling regimes in short (~300-nm) segments of 
carbon nanotubes (Fig. 1a). Fig. 1b shows the representative electrical 


conductance through such a sample. The size quantization of electron 
states in the nanotube, combined with the mutual repulsion of the 
electrons, gives rise to a ‘Coulomb blockade’ pattern’*”*. 

We first show Luttinger-liquid-like properties in tunnelling through 
a single barrier by tuning the gate voltage to Coulomb blockade valleys 
Y and Z of Fig. 1b, where no low-energy excitations exist in the 
nanotube. Electrons are then transmitted through the nanotube by 
co-tunnelling processes (see Supplementary Information section I). 
These processes are almost independent of energy, at scales smaller 
than the nanotube charging energy and level spacing (both millielec- 
tronvolts); at such energies, the nanotube should behave just like a 
single tunnel barrier. 

The conductance in valleys Y and Z, plotted against bias voltage, V, 
shows a surprising zero-bias anomaly (ZBA), which gets progressively 
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Figure 1 | Emulating Luttinger liquid with resistive environment. a, AFM 
image of the sample. The carbon nanotube (CNT) is contacted by two metal 
leads (S and D), forming a quantum dot. Two side gates (SG, and SG;) control 
the coupling of the dot to the two leads, as used to obtain results shown in Fig. 2. 
b, Differential conductance G = di/dV of a similar sample, as a function of the 
(back-)gate voltage, Veate, at T= 1.8 K. We focus on Coulomb blockade valleys 
Y and Z, in which electron transport is conducted through co-tunnelling 
processes. c, Differential conductance versus bias voltage V, showing a 
pronounced zero-bias anomaly in both valleys. T = 1.7-0.03 K (top to bottom). 
d, G(V,T) data measured in valley Y at different temperatures (coloured points) 
can be rescaled to collapse on the same universal curve, described by the 
theoretical expression of ref. 21 (yellow line). Inset shows the clear power-law 
dependence of zero-bias conductance G(0,T) on temperature, with the same 
exponent in both valleys. 
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deeper as the temperature decreases (Fig. 1c). As the shape of the ZBA 
in the two valleys is the same up to an overall scale factor, the existence 
of the ZBA is not due to the nanotube itself. Indeed, the distinct feature 
of our samples is the metal leads to the nanotube, which are made 
rather resistive (kilohms; see Supplementary Information section IJ). 
‘Tunnelling with dissipation’'” between such resistive leads is known to 
result in suppressed conductance: dI/dV=G « max(kpT,eV)”" (refs 
18-21), where I is current, G is conductance, kp is the Boltzmann 
constant, T is temperature and r= e’R/h, the ratio of the resistance 
of the leads, R, to the quantum resistance hle’. This expression is 
similar to the power-law suppression expected for tunnelling in a 
Luttinger liquid**. However, no real Luttinger liquid is present in 
our nanotube; at our operating temperatures, the length of an ideal 
clean nanotube would have to be about 100 [1m to suppress the size 
quantization. Note the highly unusual appearance of the resistance in 
the exponent, which allows us to control the strength of tunnelling 
suppression simply by changing R. 

Experimentally, the zero-bias conductance scales as G(0,T) « To 
with the same exponent 2r ~ 0.6 found in both valleys (Fig. 1d inset); 
this value is consistent with the leads resistance (R ~ 6.5 kQ in this 
sample; see Supplementary Information section II). Furthermore, we 
can rescale the whole set of G(V,T) curves measured in valley Y as 
shown in Fig. 1d, which presents G(V,T)/G(0,T) as a function of 
eV/kgT —a dimensionless ratio of bias to temperature. The yellow 
curve overlying the symbols is the result of the full theoretical 
expression describing tunnelling with dissipation’’, in which we use 
the same value of r = 0.3 extracted from the temperature dependence. 

The expression used to fit the data in Fig. 1d is similar to the one 
describing tunnelling between two Luttinger liquids”. The similarity 
may be understood qualitatively: for both tunnelling in a dissipative 
environment and in a Luttinger liquid, the tunnelling electron’s charge 
couples to a continuum of bosonic modes (plasmons); at zero energy 
(temperature or bias), the electron cannot excite the modes, and 
tunnelling is suppressed. Furthermore, the formal mapping of the 
two problems has been demonstrated for the single-barrier case in 
ref. 23. The recipe is to replace the Luttinger interaction parameter g 
by 1/(r + 1): for example, the case of vanishing dissipation, r = 0, 
corresponds to the non-interacting Luttinger liquid, g=1. It is 
important to realize that for r ~ 0 electrons do in fact interact with 
each other through their coupling to the bosonic modes. We use the 
analogy between tunnelling in a dissipative environment and tunnel- 
ling in a Luttinger liquid through the rest of this text. 

Having established Luttinger-liquid-like behaviour in single-barrier 
tunnelling, we now turn to the main focus of this work: resonant 
tunnelling between interacting leads. We study single-electron con- 
ductance peaks, similar to those shown in Fig. 1b, but measured on a 
different sample. A key feature of our experiment is the use of addi- 
tional side gates to tune the coupling of the resonant level to the leads 
(Fig. 1a). Figure 2a shows the differential conductance map as a func- 
tion of the side- and back-gate voltages. Clearly, the heights of the 
peaks change along the traces. For several of the peaks, the conduc- 
tance reaches a maximum at some intermediate value of the side-gate 
voltage, indicating that the tunnelling rates from the resonant level to 
the source and the drain are equal: I’; = Ip (‘symmetric coupling’). 

We focus on peak X of Fig. 2a, with the side gate tuned so that the 
tunnelling is either symmetric (Fig. 2b) or asymmetric (Fig. 2c). 
Clearly, the two cases behave in markedly different ways. In the 
asymmetric case, the peak height decreases at low temperatures, while 
the width saturates”. In the symmetric case, the peak width decreases, 
while the peak height grows and reaches e”/h. It is remarkable that the 
resonant tunnelling conductance can reach the unitary limit despite 
coupling to the interacting leads, which suppress tunnelling in the 
single-barrier case. 

To account for the observed behaviour, we have developed a model 
(see Supplementary Information sections IV-VI) of a resonant level 
connected to two electron reservoirs, with excitation of environmental 
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Figure 2 | Resonant lineshape: symmetric and asymmetric cases. a, Zero- 
bias differential conductance as a function of Vzae and the voltage Vs applied to 
one of the side gates. Several peaks reach a maximal conductance of e*/h (1.0 on 
colour scale) along their traces in the range shown here. The base temperature is 
T = 50 mK; a perpendicular magnetic field of 6 T is applied to select a single spin 
species. White horizontal lines and ‘X’ are explained below. b, c, Resonant 
conductance for symmetric (b) and asymmetric (c) coupling, measured at 
several temperatures on the peak marked ‘X’ ina, as a function of AV gate, the gate 
voltage relative to the centre of the peak. The side-gate voltages are fixed at the 
values indicated by white lines in a. As the temperature is reduced, in the 
symmetric case the peak becomes taller and narrower. By contrast, in the 
asymmetric case, the peak becomes shorter and its width saturates. 


modes represented by a dynamical phase associated with the tunnel- 
ling matrix element’*. We show that the analogy between tunnelling 
with dissipation and tunnelling in a Luttinger liquid**”**>”° further 
extends to our case of resonant tunnelling. Based on our mapping 
and the Luttinger-liquid predictions*’, in the case of symmetric 
coupling we expect the peak height to saturate at e/h (spinless case), 
and the resonance width to scale at low temperatures as T””* 
(ref. 8). For asymmetric coupling, the resonance width is predicted 
to saturate at sufficiently low temperature’®"’, while the peak height 
should scale to zero as G x T””, featuring the same exponent as in the 
single-barrier (non-resonant) tunnelling. 

Our experiment clearly corroborates these predictions (Fig. 3). 
Quantitatively, we extract r ~ 0.75 from the scaling of the asymmetric 
peak height, which agrees with the leads resistance in this sample. The 
width of the symmetric peak scales with an exponent of 0.45, consist- 
ent with r/(r + 1) ~ 0.43. (We discuss the accuracy of extracting the 
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Figure 3 | Resonant peak parameters at different degrees of asymmetry. 
Conductance peak height (a) and width (b) measured at several values of Vc; 
which controls the degree of asymmetry of the tunnel barrier. (Same sample as 
in Fig. 2, but different peak; from symmetric to most asymmetric, (Vsq Veate) 
values range from (3.25,—0.518) to (—6.10,—0.692).) Note that in the 
symmetric case, with decreasing temperature, the peak height saturates at e/h, 
and its width (full width at half-maximum) monotonically decreases. In the 
asymmetric cases, the behaviour is the opposite: the width of the peak saturates, 
while the peak height decreases. 


exponents in Supplementary Information section III.) Overall, 
application of refs 8-12 to our experiment describes the observed 
behaviour remarkably well, in both the symmetric and asymmetric 
cases. 

Note that the width of the conductance peak in the symmetric case 
decreases monotonically with decreasing temperature. In the limit of 
zero temperature, we expect that the conductance will equal zero 
everywhere, except for a singular point at the centre of the peak. 
When tunnelling asymmetry is introduced, the singular point 
disappears, and the low-temperature conductance tends to zero at 
any gate voltage, Vyate. This behaviour indicates a QPT for symmetric 
coupling, I's = Ip. Technically, we refer to a ‘boundary’ QPT, in 
which only a local part of a larger system (local site plus environment) 
undergoes the transition (see, for example, ref. 5). QPTs found in 
strongly correlated bulk materials are often explained by invoking 
interactions between local sites and collective modes*. In the same 
spirit, our observation provides an example of a QPT in a highly 
tunable system that emulates such a ‘local site’ (that is, the resonant 
level), embedded in an interacting host. 

Following ref. 12, we map our model in the r = 1 case onto the exotic 
two-channel Kondo model?’”’, for which a QPT is known to occur 
exactly for symmetric coupling’. (See Supplementary Information 
section V for details.) In both models, the origin of the quantum critical 
behaviour is the competition between the two channels attempting to 
screen the local site (spin or resonant level). 

Intermediate values of r, between 0 and 1, do not allow for a simple 
interpretation in terms of any Kondo model with non-interacting 
leads, but represent a continuous evolution between the non-interacting 
resonant level at r=0 and the two-channel Kondo model’”’. The 
critical exponents describing the system parameters close to the 
quantum critical point are not fixed, but are controlled by the value 
of r. Thus, our system not only provides new insight into the two- 
channel Kondo model—a model example of quantum criticality— 
but also gives access to a new family of quantum critical points for 
r>0. 

The QPT observed here is different from the various QPTs 
observed” and predicted” in quantum dots coupled to a single screen- 
ing channel; indeed, there the QPTs are of the Kosterlitz~Thouless 
type, whereas in our case the QPT is of second order (see 
Supplementary Information section VI). Furthermore, in our case, 
the key ingredient that enables the QPT is the symmetric coupling 
to the two leads, which allows for their competition; the interaction in 
the leads (finite r) prevents their hybridization. 


LETTER 


a 
Teo ee cocle 
. Tf a6 
Peeve lislei ailb gg 
a ae oe Ae Be 
at * an 
O41 ae ae ee 
= 014. F 
< i ie] a, ee 
% Pols. # 
& Ales 
oO ob Ee |S 
0.019." «1 «*) AV. 
iy gate 
bi + 
. 
F 
: 
0.001 4 


0.1 1 
Temperature (K) 


Figure 4 | Phase diagram and the quantum critical point. a, Conductance in 
the symmetric-coupling case, plotted versus temperature at different values of 
gate voltage. AVeate = 0, 0.7, 1.3, 2, 2.6, 5.0, 3.8 and 6.5 mV, from top to bottom. 
Note the similarity to Fig. 3a. b, Proposed phase diagram: the quantum critical 
point at the centre (symmetric coupling and AVyate = 0) has unitary conductance. 
Any deviations from this point result in vanishing conductance at T = 0. 


Finally, the conductance in the symmetric case can be plotted as a 
function of temperature for several values of AVgate (Fig. 4a). The 
similarity with Fig. 3a is striking; apparently, one can tune away from 
the unitary resonance either by inducing asymmetry (Fig. 3a) or by 
applying the gate voltage (Fig. 4a)*, with virtually the same results. 
Note that the downturn of peak height in Figs 3a and 4a occurs at 
progressively lower temperature as either the degree of asymmetry or 
AV gate is reduced. Clearly, a new energy scale is emerging in the system, 
controlled by proximity to the quantum critical point*'®'!. We anticipate 
that this scale should vanish exactly at that point. We therefore propose 
the phase diagram shown in Fig. 4b, with a quantum critical point at 
I's =Ip, AVgate = 0 (I. Affleck, personal communication). The four 
quadrants represent the states of the nanotube filled with N or N+ 1 
electrons, or coupled more strongly to either the source or the drain. The 
boundaries between the quadrants are smeared, and at T = 0 the con- 
ductance tends to zero everywhere, except at the quantum critical point. 

In conclusion, we have investigated resonant tunnelling between 
interacting leads emulating Luttinger liquids. For symmetric coupling 
of the spinless resonant level to the two leads, and on resonance, the 
low-temperature conductance saturates at the unitary value of lh. 
Weassociate this behaviour with a quantum critical point, which exists 
at I’; =I p in the presence of a finite interaction strength r> 0. 
Moving away from this point by inducing tunnelling asymmetry 
results in suppression of conductance at low temperature and smear- 
ing of the QPT. We believe that our work is the first example of a QPT 
in a highly tunable system, in which many parameters can be con- 
trolled, including the strength of interactions. 


Received 13 April; accepted 24 May 2012. 


Giamarchi, T. Quantum Physics in One Dimension (Oxford Univ. Press, 2004). 
Chang, A. Chiral Luttinger liquids at the fractional quantum Hall edge. Rev. Mod. 
Phys. 75, 1449-1505 (2003). 

Deshpande, V. V., Bockrath, M. W., Glazman, L. |. & Yacoby, A. Electron liquids and 
solids in one dimension. Nature 464, 209-216 (2010). 

4. Sachdev, S. Quantum Phase Transitions 2°“ edn (Cambridge Univ. Press, 2011). 
5. Vojta, M. Impurity quantum phase transitions. Phil. Mag. 86, 1807-1846 (2006). 
6 

7 


ae 


Bloch, |. Ultracold quantum gases in optical lattices. Nature Phys. 1, 23-30 (2005). 
Si, Q. & Steglich, F. Heavy fermions and quantum phase transitions. Science 329, 
1161-1166 (2010). 

8. Kane,C.L.& Fisher, M.P.A. Transmission through barriers and resonant tunnelling 
in an interacting one-dimensional electron gas. Phys. Rev. B 46, 15233-15262 
(1992). 

9. Eggert, S. & Affleck, |. Magnetic impurities in half-integer-spin Heisenberg 
antiferromagnetic chains. Phys. Rev. B 46, 10866-10883 (1992). 

10. Nazarov, YuV & Glazman, L. |. Resonant tunnelling of interacting electrons ina one- 
dimensional wire. Phys. Rev. Lett. 91, 126804 (2003). 

11. Polyakov, D. G. & Gornyi, |. V. Transport of interacting electrons through a double 
barrier in quantum wires. Phys. Rev. B 68, 035421 (2003). 

12. Komnik, A. & Gogolin, A. O. Resonant tunnelling between Luttinger liquids: a 
solvable case. Phys. Rev. Lett. 90, 246403 (2003). 


2 AUGUST 2012 | VOL 488 | NATURE | 63 


©2012 Macmillan Publishers Limited. All rights reserved 


LETTER 


13. 
14. 


15. 
16. 


17. 
18. 


19. 


20. 


21. 


22. 
23. 


24. 
25. 


Milliken, F., Umbach, C. & Webb, R. Indications of a Luttinger liquid in the fractional 
quantum Hall regime. Solid State Commun. 97, 309-313 (1996). 

Maaasilta, |. & Goldman, V. Line shape of resonant tunnelling between fractional 
quantum Hall edges. Phys. Rev. B 55, 4081-4084 (1997). 

Kastner, M.A. Artificial atoms. Phys. Today 46, 24-31 (1993). 

Kouwenhoven, L. P. et al. in Mesoscopic Electron Transport (eds Sohn, L. L., 
Kouwenhoven, L. P. & Sch6n, G.) 105-214 (Kluwer, 1997). 

Leggett, A.J. etal. Dynamics of the dissipative two-state system. Rev. Mod. Phys. 59, 
1-85 (1987). 

Ingold, G.-L. & Nazarov, Y. V. in Single Charge Tunnelling: Coulomb Blockade 
Phenomena in Nanostructures (eds Grabert, H. & Devoret, M. H.) 21-107 (Plenum 
Press, 1992). 

Flensberg, K., Girvin, S., Jonson, M., Penn, D. R. & Stiles, M. D. Quantum mechanics 
of the electromagnetic environment in the single-junction Coulomb blockade. 
Physica Scripta T42, 189-206 (1992). ~. 

Joyez, P., Esteve, D. & Devoret, M. H. How is the Coulomb blockade suppressed in 
high- conductance tunnel junctions? Phys. Rev. Lett 80, 1956-1959 (1998). 
Zheng, W., Friedman, J., Averin, D. V., Han, S. & Lukens, J. E. Observation of strong 
Coulomb blockade in resistively isolated tunnel junctions. Solid State Commun. 
108, 839-843 (1998). 

Sassetti, M., Napoli, F. & Weiss, U. Coherent transport of charge through a double 
barrier in a Luttinger liquid. Phys. Rev. B52, 11213-11224 (1995). 

Safi, |&. Saleur, H. One-channel conductor in an ohmic environment: mapping toa 
Tomonaga-Luttinger liquid and full counting statistics. Phys. Rev. Lett 93, 126602 
(2004). 

Bomze, Y., Mebrahtu, H., Borzenets, |., Makarovski, A. & Finkelstein, G. Resonant 
tunnelling in a dissipative environment. Phys. Rev. B 79, 241402(R) (2009). 

Le Hur, K. & Li, M.-R. Unification of electromagnetic noise and Luttinger liquid via a 
quantum dot. Phys. Rev. B 72, 073305 (2005). 


64 | NATURE | VOL 488 | 2 AUGUST 2012 
©2012 Macmillan Publishers Limited. All rights reserved 


26. Florens, S., Simon, P., Andergassen, S. & Feinberg, D. Interplay of electromagnetic 
noise and Kondo effect in quantum dots. Phys. Rev. B 75, 155321 (2007). 

27. Hewson, A. The Kondo Problem to Heavy Fermions (Cambridge Univ. Press, 1997). 

28. Potok, R. M., Rau, |. G., Shtrikman, H., Oreg, Y. & Goldhaber-Gordon, D. J. 
Observation of the two-channel Kondo effect. Nature 446, 167-171 (2007). 

29. Goldstein, M. & Berkovits, R. Capacitance of a resonant level coupled to Luttinger 
liquids. Phys. Rev. B 82, 161307 (2010). 

30. Roch, N., Florens, S., Bouchiat, V., Wernsdorfer, W. & Balestro, F. Quantum phase 
transition in a single-molecule quantum dot. Nature 453, 633-637 (2008). 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements We appreciate discussions with |. Affleck, D. V. Averin, A.M. Chang, 
C.H. Chung, S. Florens, M. Goldstein, L. |. Glazman, K. Ingersent, K. Le Hur, M. Lavagna, 
A. H. MacDonald, Yu. V. Nazarov, D. G. Polyakov and M. Vojta. We thank J. Liu for 
providing the nanotube growth facilities and W. Zhou for helping to optimize the 
nanotube synthesis. The work was supported by US DOE awards DE-SC0002765, 
DE-SC0005237 and DE-FG02-02ER15354. 


Author Contributions H.T.M.,|.V.B. and G.F. designed the experiment. H.T.M. fabricated 
the samples. H.T.M., |.V.B., Y.V.B., A.S. and G.F. conducted the experiment. H.T.M. and 
G.F. analysed the data. H.T.M, D.E.L.,H.Z., H.U.B. and G.F. interpreted the data. D.E.L, H.Z. 
and H.U.B. developed the theory. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of this article at 
www.nature.com/nature. Correspondence and requests for materials should be 
addressed to G.F. (gleb@phy.duke.edu). 


LETTER 


doi:10.1038/nature11297 


A Newtonian approach to extraordinarily strong 


negative refraction 


Hosang Yoon!, Kitty Y. M. Yeung', Vladimir Umansky* & Donhee Ham! 


Metamaterials with negative refractive indices can manipulate elec- 
tromagnetic waves in unusual ways, and can be used to achieve, for 
example, sub-diffraction-limit focusing’, the bending of light in the 
‘wrong direction’, and reversed Doppler and Cerenkov effects’. 
These counterintuitive and technologically useful behaviours have 
spurred considerable efforts to synthesize a broad array of negative- 
index metamaterials with engineered electric, magnetic or optical 
properties'’°. Here we demonstrate another route to negative 
refraction by exploiting the inertia of electrons in semiconductor 
two-dimensional electron gases, collectively accelerated by electro- 
magnetic waves according to Newton’s second law of motion, where 
this acceleration effect manifests as kinetic inductance’’’”. Using 
kinetic inductance to attain negative refraction was theoretically 
proposed for three-dimensional metallic nanoparticles’ and seen 
experimentally with surface plasmons on the surface of a three- 
dimensional metal’*. The two-dimensional electron gas that we 
use at cryogenic temperatures has a larger kinetic inductance than 
three-dimensional metals, leading to extraordinarily strong nega- 
tive refraction at gigahertz frequencies, with an index as large as 
—700. This pronounced negative refractive index and the corres- 
ponding reduction in the effective wavelength opens a path to 
miniaturization in the science and technology of negative refraction. 
The idea of creating negative refraction by exploiting the collective 
electron acceleration (inertia) effect, or kinetic inductance, was 
theoretically proposed for specific arrangements of three-dimensional 
(3D) metallic nanoparticles'’. Experimentally, inertia-based negative 
refraction was implied in work where a particular guiding of surface 
plasmon polaritons on the surface of a 3D metal led to negative refrac- 
tion"; this cannot be explained without electron acceleration, because 
a defining component of plasmons is the time-varying kinetic energy 
of their constituent electrons, which implies their acceleration. 
Semiconductor two-dimensional (2D) electron gases (2DEGs) 
possess a much larger kinetic inductance than 3D bulk metals. Here 
we create negative-index metamaterials by fully exploiting this large 
kinetic inductance, whose impact is manifested in the extraordinarily 
large negative index, which we measure to be as large as n = —700. 
This is two orders of magnitude larger than the index of n ~ —5 to —1 
in surface-plasmon-based negative refraction", and indicates that in- 
ertia isa much more important factor in our 2DEG case. It is also much 
larger than the theoretical expectation based on the kinetic inductance of 
3D metallic nanoparticles’*, which is orders of magnitude smaller than 
our 2D kinetic inductance (Supplementary Information, section 1). 
We choose a GaAs/AlGaAs 2DEG as a demonstration platform. 
Here electrons can accelerate for ~0.2ns at a temperature of 4K 
without scattering, with the result that their large kinetic inductance 
effect is not masked by the scattering at and above gigahertz frequencies. 
Specifically, our metamaterial is a periodic array of mesa-etched 2DEG 
strips (Fig. 1a, b), each of which is connected to ground lines (labelled 
‘G in Fig. 1a) at both ends via ohmic contacts. Each strip’s width and 
length are respectively denoted W and J, and the centre-to-centre 
distance between neighbouring strips, or periodicity, is denoted a. 


This metamaterial is excited by electromagnetic waves guided by the 
left signal line (labelled ‘S’ in Fig. 1a), which, flanked by the ground 
lines, forms an on-chip coplanar waveguide (CPW) with a 50-Q char- 
acteristic impedance. This left signal line is extended to cover a few 
2DEG strips on the left-hand side of the metamaterial, with dielectric 
between the signal line and the 2DEG strips. The metamaterial’s res- 
ponse is picked up by the right signal line (also labelled ‘S’) of another 
CPW on the right-hand side of the metamaterial. 

The electric fields of the excitation electromagnetic wave, oscillating 
between the signal and ground lines of the left CPW, collectively 
accelerate electrons in the leftmost few 2DEG strips, producing 
currents along the strips. The resulting alterations of charge distri- 
bution in these strips will capacitively couple to neighbouring strips 
to the right, accelerating electrons there. This process repeats to deliver 
an ‘effective wave’ from left to right, perpendicular to the direction of 
the strips. From the circuit point of view, each 2DEG strip—along 
which electrons collectively accelerate, with the resulting current 
lagging the accelerating voltage by 90° according to Newton’s second 
law of motion—acts as non-magnetic inductance of kinetic origin’. 
This 2D kinetic inductance, Ly2p, results from Newton’s law: 
Ligap = m*/(ngpe’) X (1e/W), where m*, e and nzp are respectively 
the electrons’ effective mass, charge and density per unit area, and |., 
which will be identified shortly, is the effective length of each strip, 
within which electrons accelerate in response to the excitation. Our 
metamaterial is then an array of capacitively coupled kinetic inductors 
(Fig. 1c and Supplementary Information, section 2), and may be 
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Figure 1 | Device description. a, Optical image of a 2DEG strip-array 
metamaterial prototype. Ground-signal-ground (GSG) on-chip CPWs direct 
electromagnetic waves to and from the metamaterial. The inset shows a 
magnified portion of the strip array. In this specific prototype, W = 1 um, 

1= 112 um and a = 1.25 pum. b, Schematic of the metamaterial (not drawn to 
scale), with the front face corresponding to a cut through the dashed symmetry 
line in a. c, Circuit description of the half of the metamaterial below or above the 
symmetry line along the effective wave propagation direction (Supplementary 
Information, section 2). 
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likened to a left-handed transmission line’*’’, which is an array of 
capacitively coupled magnetic inductors and is known to be negatively 
refracting. However, our negative refraction originates in a different 
physical phenomenon: our device uses extremely large 2DEG kinetic 
inductance, whereas the left-handed transmission line relies on a much 
smaller magnetic inductance. 

To examine the negative refraction behaviour of our device, we rep- 
resent the effective wave, in terms of the voltage at the tip of the mth 
kinetic inductor (Fig. 1c), as V,,,(t) ox elt mka) Where « is the angular 
frequency and k is the effective wavenumber. The standard circuit 
analysis of Fig. 1c yields a dispersion relation w(k) = ,/|sin(ka/2)|, 
where ©. = [2V(Liap0)]* is the cut-off frequency at the boundary 
of the first Brillouin zone (k = +1/a) and C is the capacitance between 
adjacent strips over the effective length (Supplementary Information, 
section 2). For w > «,, the dispersion relation (Fig. 2a) predicts negative 
refraction, because the tangential slope de/dk (the group velocity) and 
the slope «/k (the phase velocity) have opposite signs'®. The cut-off 
behaviour results from the metamaterial’s high-pass nature, and can 
also be seen from the current distributions across the metamaterial 
below and above the cut-off frequency (Fig. 2b), which we simulated 
using an electromagnetic field solver (Supplementary Information, 
section 6). Beyond the cut-off frequency (Fig. 2b, right), the current 
is concentrated at the bottom and top regions of the strips, from which 
I. can be estimated. We note that, whereas a single sheet of 2DEG 
exhibits ordinary dispersion”’, the acceleration of electrons along an 
array of strips of 2DEG, perpendicular to the direction of effective wave 
propagation, causes negative refraction. 

The dispersion relation of our metamaterial has the same form as 
that of the left-handed transmission line’*'”, but with the magnetic 
inductance replaced with the much larger 2DEG kinetic inductance; 
the 2DEG kinetic inductance is 1.25nH wm! for a 1-um wide 2DEG 
strip, which is ~2,800 times larger than the same strip’s magnetic 
inductance, 0.44pH um | (Supplementary Information, section 1). 
The effective refractive index derived from the dispersion is 
n= —2c/(aw) X sin '(@,/@), where c is the speed of light in vacuum, 
and has the maximum attainable magnitude of 2c/(aw,) = 4c/ 
ax V(L2pC), which is exceedingly large owing to the large 2DEG 
kinetic inductance, corresponding to the substantial slowing of the 
effective wave. 

Microwave scattering experiments with on-chip probing confirm 
this extraordinarily strong negative refraction. The reflection of an 
electromagnetic wave incident on the left on-chip CPW and its trans- 
mission to the right on-chip CPW after propagation through the 
metamaterial are measured over a range of ~ 1-50 GHz using a vector 
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Figure 2 | Theory and simulation. a, Plot of (k) = [2\(Ly2pC)|sin(ka/ 

2) | ] 1, with Ly2p = 39 nH and C = 4.6 fF estimated for the structure measured 
for Fig. 3 (W = 1 um, a = 1.25 pm, / = 112 um). The group and phase 
velocities, dao/dk and «/k, have opposite signs, showing negative refraction; this 
occurs for both k > 0 and k < 0, but we show only the latter, which is relevant to 
our measurements. b, Simulated current distributions below (left; 5 GHz) and 
above (right; 30 GHz) the cut-off frequency. Red and blue colours indicate high 
and low current densities, respectively. Above the cut-off frequency, regions of 
high, constant current density are observed, from which the effective strip 
length /, is estimated. a.u., arbitrary units. 
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network analyser. Propagation delays in the two on-chip CPWs and 
parasitic couplings between them bypassing the metamaterial were 
separately measured and de-embedded; from the resulting transmis- 
sion and reflection coefficients, s), and s,,, at each measurement fre- 
quency, we extract, using a well-established method’’”’, the effective 
wave’s phasor change e “““ due purely to propagation of a distance d 
across the metamaterial (Supplementary Information, section 3). 

Figure 3a shows the frequency-wavenumber (f-k) dispersion so 
obtained at temperatures of 4.2, 10 and 20 K for a 13-strip metamaterial 
with W = 1 um, / = 112 um, |. = 31 um and a = 1.25 um. Because the 
measured parameters s, and s,, set the left-to-right energy propaga- 
tion direction (that is, the direction of the group velocity) as the positive 
reference direction, if our metamaterial is negatively refracting, the 
sign of the extracted wavenumber will be negative with no ambiguity, 
which is indeed seen in Fig. 3a. Negative refraction is also consistently 
confirmed in Fig. 3a by the fact that dao/dk and w/k have opposite signs 
above the 12-GHz cut-off frequency. This measured dispersion, 
including the cut-off frequency, differs in its details from the calcula- 
tion that uses lumped circuit elements, ignores losses due to electron 
scattering in the 2DEG strips and ohmic contacts, and considers only 
nearest capacitive couplings (Fig. 2). But it has the same underlying 
features, demonstrating negative refraction. The dark area in Fig. 3a, 
where the distinctively spurious behaviour of the dispersion appears, is 
indicative of the cut-off region, which is irrelevant to the operation of 
the device (Supplementary Information, section 3). From this f-k 
dispersion, we obtain the effective refractive index using n = kc/w, 
whose real part is as large as — 500 (Fig. 3b). This large negative index, 
which is difficult, if not impossible, to achieve with magnetic induct- 
ance*>15-!7.2425 | allows drastic device miniaturization and can facilitate 
ultra-subwavelength localization. The same measurements performed 
on the 2DEG strip array, but with energy propagation along the strips, 
yield positive refraction, further highlighting our negative refraction 
strategy (Supplementary Information, section 5). 

We find that |Re(1)| decreases with frequency (Fig. 3b), because as 
the frequency increases adjacent strips are coupled more capacitively, 
which increasingly bypasses the electron acceleration effect within 
each separated strip. Figure 3c shows the figure of merit, |Re(n)/Im(n)|, 
which here reflects losses due to electron scattering in the 2DEG strips 
and ohmic contacts. It takes a value of ~2 over a reasonably large part 
of the negative refraction region, similarly to negative refraction 
devices using metals at optical frequencies****. Figure 3a—c shows that 
the negative refraction behaviour is essentially the same regardless of 
temperature (4.2, 10 and 20k), indicating that the degree of electron 
scattering in the 2DEG strips and ohmic contacts remains largely the 
same within this temperature range, not masking the inertia effect. In 
Fig. 3c, the figure of merit is largest at 10 K instead of at 4.2 K, but these 
variations with respect to temperature arise mostly from inconsistent 
probe landings during multiple calibration steps, which are done for 
measurements at each temperature. Fluctuations at high frequencies, 
for example, in Fig. 3c, are also due to imperfect calibration. 

At 297K, electron scattering in each 2DEG strip becomes so severe 
that the acceleration effect is completely masked. Equivalently, the strip’s 
ohmic resistance becomes far larger (~100 kQ) than the impedance of 
its kinetic inductance. The strip array then becomes essentially an open 
circuit, causing the signal to be mostly reflected. This reflection can be 
seen from the value of the reflection coefficient, sui = 1, measured at 
297 K, which differs from |s,| at cryogenic temperatures, where the strip 
array exhibits negative refraction (Fig. 3d). The transmission coefficient, 
|so;|, becomes smaller at 297K also because of the open-circuit 
behaviour, but is not unappreciable (Fig. 3d). To understand this, we 
fabricated exactly the same structure as the previous device, but 
without the strip array, thus creating an actual open circuit between 
the two on-chip CPWs. The behaviour of |s2;| for this open device at 
4.2 K closely resembles that of |s2,| for the strip array at 297 K (Fig. 3d). 
This demonstrates that the behaviour in the strip array at 297 K is due 
largely to the parasitic coupling between the two CPWs bypassing the 
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Figure 3 | Temperature-dependent measurements. a, Dispersion of the 13- 
strip metamaterial at 4.2, 10, and 20 K. The dark region is indicative of the cut- 
off behaviour. b, Re(1) versus frequency. c, Figure of merit |Re(7)/Im(n)| 

versus frequency. d, Parameters |s,,| and |s>,| of the metamaterial, separately 


strip array, confirming the open-circuit nature of the device at 297 K 
(in fact, the phases of the transmission and reflection parameters are 
also much the same in the open device at 4.2 K and the strip array at 
297 K; see Supplementary Information, section 4). These results pro- 
vide further confirmation that the negative refraction we observe at 
cryogenic temperatures is due to the kinetic inductance. Also, in most 
of the dark area in Fig. 3d, the |s,,| value of the metamaterial even at 
cryogenic temperatures is very similar to that of the open device, 
confirming the cut-off nature in that region. 

To examine further the impact of kinetic inductance on negative 
refraction, we measure a new set of devices of various geometric para- 
meters. Comparison of devices with different values of / (and, thus, /.) 
for the same values of W and a is especially instructive; changing / 
scales Ly2op, C and @, proportionally to |, affecting the index 
n= —(2claw) X sin '(@,/@) with only one parameter, @.. 
Specifically, a device with longer strips, with larger values of Ly.» 
and C, and a smaller «, value, will have a negative index with a larger 
maximum attainable magnitude, 2c/aw., reaching the frequency 
region forbidden for a shorter-strip device. In the frequency region 
accessible by both longer- and shorter-strip devices, the shorter-strip 
device will have a larger negative index than the longer-strip device if 
the two are compared at the same frequency (Supplementary Fig. 4). 
This clear-cut property emerges in measurements of a pair of devices 
both with a = 1.25 um but with differing values of / (112 versus 52 um) 
or /. (31 versus 14 um) (Fig. 4a). This property is confirmed again in 
two additional pairs of devices (Fig. 4b), where the refractive index is as 
large as —700. 

Altering the periodicity a, which in general may have to be com- 
bined with altering W, affects the index n = —(2c/aw) sin '(w,/a) 
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indicated by the dashed ovals, at 4.2, 10, 20 and 297 K. Also shown are lsu | and 
|so4| of the open-circuit device at 4.2 K. Unlike the data in a, b and ¢, these are 
‘raw’ s parameters without de-embedding, showing the parasitic coupling 
between the two CPWs. 


in a more complicated manner, owing to simultaneous changes in a 
and @,. For! = 112 tm, as we decrease (a, W) first from (1.5 pm, 1 1m) 
to (1.25 um, 1 um) and then to (0.75 Lum, 0.6 1m), with the first reduc- 
tion increasing C by a factor of 1.2 with L,2» unchanged, and the 
second reduction increasing Ly, >p by a factor of 1.7 with C unchanged, 
@, does not vary as much as a, owing to the square-root dependence of 
@, on Ly»p and C. Thus, a smaller periodicity will yield a larger 
negative index for the same frequency away from the cut-off regions, 
as evident in measurements (Fig. 4c). In these a and W variations, 
the characteristic impedance and, thus, the impedance mismatch is 
varied. This results in imperfect de-embedding of the parasitic 
couplings near cut-offs, obscuring the cut-off behaviours. The 
tendency of the index to be more negative for smaller periodicities is 
seen again for ] = 52 tm with the same variations of a and W (Fig. 4d). 
The crossing of the respective data for the devices with a= 1.25 
and 1.5 lm is an anomaly that we suspect arises from the impedance 
mismatch variation. 

The exceedingly strong inertia-based negative refraction demon- 
strated here requires a solid-state platform with a very large kinetic 
inductance and low electron scattering. To meet these requirements, 
we used a GaAs/AlGaAs 2DEG at cryogenic temperature. Scaling the 
2DEG metamaterial to higher frequencies by simultaneous reduction 
of the strip length and periodicity (Fig. 4 and Supplementary 
Information, section 2) would relax the condition on electron scatter- 
ing time (and, thus, temperature); demonstration” of terahertz 
plasmonic devices at room temperature with GaAs/AlGaAs 2DEG 
bodes well for high-temperature applications. Graphene, another type 
of 2D conductor with high mobility at room temperature”, may also 
be a platform for terahertz-frequency negative refraction based on a 
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Figure 4 | Geometry-dependent measurements at 4.2 K. a, Re(n) fora pair of 
13-strip metamaterials with a = 1.25 um and W = 1 pm, but with different / 

values. b, Re() for another two pairs of 13-strip metamaterials with different / 
values: (a, W) = (0.75 ttm, 0.6 tum) and (1.50 pum, 1 pm). ¢, Re(n) for / = 112 um 


similar kinetic approach. Although electrons in graphene act as 
massless particles, and thus are non-Newtonian, they still possess 
kinetic energy, exhibiting plasmonic behaviour with implicit kinetic 
inductance. In fact, terahertz light-plasmon coupling has been 
recently observed at room temperature*®. Achieving strong negative 
refraction based on the kinetic approach with higher isotropy and at 
optical frequencies using different material systems is also open to 
further investigation. 


METHODS SUMMARY 


We fabricate the devices on GaAs/AlGaAs 2DEG substrates obtained by molecular 
beam epitaxy. The layer structure above the 2DEG comprises 40-nm 
Alo36Gao.esAs, 14-nm Si-doped Alp 36GaoesAs, 10-nm Alp 36Gao.64As and a 
7-nm GaAs cap. At 4K, the mobility of the 2DEG is 4.6 X 10°cm? V's! and 
the carrier density is 1.9 x 10'' cm’, both in the dark. 2DEG strips are defined by 
electron beam lithography, followed by wet etching (>71-nm depth) with 150:1:1 
H,0:H,0,:NH,OH. Ohmic contacts are defined by photolithography followed by 
thermal evaporation of Ni (5 nm)/Au (20 nm)/Ge (25 nm)/Au (10 nm)/Ni (5 nm)/ 
Au (40 nm), and annealing at 420 °C for 50s. CPWs are defined by photolitho- 
graphy and formed by thermal evaporation of Cr (8 nm)/Au (500 nm). 

The microwave scattering analysis is performed in a Lake Shore Cryotronics 
cryogenic probe station at feedback-controlled cryogenic temperatures in the dark. 
Ground-signal-ground microwave probes, with a pitch of 100 um, connected to 
probe arms are attached to on-chip CPWs. Coaxial cables lead to the probes from 
an Agilent E8364A network analyser, which generates excitation signals of fre- 
quencies up to 50 GHz, delivering —45 dBm power to the devices, and measures 
the scattering parameters. To see the effect of the metamaterial (strip array) only, 
we first calibrate the system, at each measurement temperature, up to the tips of 
the probes by using the NIST-style multiline TRL technique”, and then perform 
additional de-embedding to remove the on-chip CPW delays and parasitic 
couplings between the CPWs, which bypass the metamaterial (Supplementary 
Information, section 3). The CPWs used for this calibration*® are fabricated on 
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with different a and W values. d, Re(m) for ] = 52 um with different a and W 
values. c and d are rearrangements of the data in a and b to facilitate 
comparison for the same /. Each device result is shown above its respective cut- 
off frequency. 


undoped GaAs substrates and designed using a Sonnet electromagnetic solver to 
have a 50-Q characteristic impedance, which is the characteristic impedance of the 
network analyser, cables and probes. 
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Increase in observed net carbon dioxide uptake by 
land and oceans during the past 50 years 


A. P. Ballantyne't, C. B. Alden’, J. B. Miller**, P. P. Tans* & J. W. C. White?? 


One of the greatest sources of uncertainty for future climate predic- 
tions is the response of the global carbon cycle to climate change’. 
Although approximately one-half of total CO, emissions is at present 
taken up by combined land and ocean carbon reservoirs’, models 
predict a decline in future carbon uptake by these reservoirs, resulting 
in a positive carbon-climate feedback’. Several recent studies suggest 
that rates of carbon uptake by the land*° and ocean’-*° have remained 
constant or declined in recent decades. Other work, however, has 
called into question the reported decline’. Here we use global-scale 
atmospheric CO, measurements, CO, emission inventories and their 
full range of uncertainties to calculate changes in global CO, sources 
and sinks during the past 50 years. Our mass balance analysis shows 
that net global carbon uptake has increased significantly by about 
0.05 billion tonnes of carbon per year and that global carbon uptake 
doubled, from 2.4 + 0.8 to 5.0 + 0.9 billion tonnes per year, between 
1960 and 2010. Therefore, it is very unlikely that both land and ocean 
carbon sinks have decreased on a global scale. Since 1959, approxi- 
mately 350 billion tonnes of carbon have been emitted by humans to 
the atmosphere, of which about 55 per cent has moved into the land 
and oceans. Thus, identifying the mechanisms and locations 
responsible for increasing global carbon uptake remains a critical 
challenge in constraining the modern global carbon budget and 
predicting future carbon-climate interactions. 

Coupled climate/carbon-cycle models predict decreased carbon (C) 
uptake by the land, owing to diminishing productivity and increasing 
respiration, and decreased C uptake by the ocean, associated with 
acidification, changes in ocean mixing and increasing sea surface tem- 
peratures, within this century’. Although detecting changes in regional 
C sinks is very challenging, several recent studies suggest that C uptake 
by the land and ocean may already be tapering off or declining. 
However, diminished C uptake in these studies is often limited to 
the regional*’° or decadal scale**"”. In addition, trends in sink intensity 
in these studies are inferred from satellite measurements®, simulated 
using models*”° or estimated on the basis of inventories of existing C 
sinks*””. Thus, their implications for long-term variation in the global C 
budget remain uncertain. Here we focus strictly on global-scale obser- 
vations provided by atmospheric CO, measurements and CO, emis- 
sion estimates, and include the full range of uncertainties in each to 
estimate changes in global C uptake during the past 50 yr. Although this 
‘top-down’ approach does not provide the detailed process-level 
information of previous studies, it does provide an unbiased assessment 
of changes in global C uptake. 

The growth rate of atmospheric CO 


dc 

C= rt EN 0 
varies in response to one-way fluxes to the atmosphere (XF) and net 
exchange between the Earth’s surface reservoirs and atmosphere (XN). 


The one-way fluxes include those from fossil fuel emissions, including 
cement production (Fg), and those from land-use change (F,). 


Negative XN values represent net uptake of CO, and comprise con- 
tributions by the land (N_) and the oceans (No). From the observed 
atmospheric growth rate and estimated fluxes, we can calculate net 
global CO, uptake (XN =dC/dt— XF) and the airborne fraction 
(AF = (dC/dt)/ZF). Although both Fr and F, are often included in 
calculating AF and XN, it can be argued that only Fp should be 
included in these calculations because it represents the addition of 
truly extrinsic C to the modern C cycle, which will be redistributed 
between the atmosphere, oceans and land. We calculate two versions 
of ZN and AF, one with Fy and F, (Figs 1 and 2 and Table 1), and one 
with only F; (Table 1). 

A major difficulty in characterizing the uncertainty of trends in 2N 
and AF is that dC/dt errors are negatively autocorrelated in successive 
years, whereas XF errors are positively autocorrelated. The uncertainty 
of a trend will be overestimated if the negative autocorrelation is not 
taken into account in the analysis, and will be underestimated if the 
positive autocorrelation is not taken into account. Thus, calculations of 
=N and AF contain both positive and negative autocorrelations, the 
balance of which is time dependent. Before 1980 uncertainties in XN 
and AF are dominated by dC/dt errors, whereas towards the end of the 
record uncertainties in XN and AF become increasingly dominated by 
XF. To take this error structure properly into account, we used a 
Monte-Carlo-type approach to simulate the errors. To evaluate the 
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Figure 1 | Trends in the global carbon budget from 1959 to 2010. a, The 
annual atmospheric CO) growth rate (dC/dt). b, Fluxes of C to the atmosphere 
from fossil fuel emissions (F;) are plotted in red and those from land-use 
changes (F,) are plotted in brown. c, Annual global net C uptake (XN) is plotted 
as a black solid line and is compared with the 10-yr moving average (dark grey 
line) and the significant linear trend (dashed line) (Table 1). All dark shaded 
bands represent 1o uncertainties and all light shaded bands represent 20 
uncertainties. Note that the scale of the y axis in c has been expanded. 
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Figure 2 | Accumulation of carbon emissions in the atmosphere, on land 
and in the oceans. a, Sums of emissions from fossil fuels and land-use change 
integrated from 1959 to 2010 (red) are compared with atmospheric 
accumulation (blue) and cumulative global uptake (black) by the land and 
oceans. The dark shaded bands represent 1 uncertainties and the light shaded 
bands represent 2¢ uncertainties. b, Mean decadal C accumulation rates for the 
atmosphere (blue) and mean global C uptake rates (grey) are calculated as the 
sum of C accumulation over a given decade divided by 10 yr. Error bars 
represent the 2¢ uncertainties (Methods). 


uncertainty in dC/dt since 1980, atmospheric sampling sites were 
resampled and global mean growth rates were calculated to account 
for the spatial variability and sparse sampling of atmospheric CO;, 
which contribute much more to uncertainty than do measurement and 
calibration errors. Before 1980, a negatively autocorrelated error com- 
ponent was added to dC/dt simulations. To assess the uncertainty in 
XF, we combined three independent inventories of Fz emissions'*’° 
with three independent inventories of F;, emissions’”~’’. To each emis- 
sion inventory, we added positively autocorrelated random errors to 
account for temporally persistent accounting errors”. These =F emis- 
sion scenarios were then combined with the simulations of dC/dt to 
estimate trends in XN and AF (Methods). 

Significantly increasing linear trends in observed dC/dt 
(0.054 + 0.011 billion tonnes of carbon (PgC) per year per year) and 
estimated Fp (0.115 + 0.011 PgC yr’) are evident, whereas F;, shows a 
slight decline between 1959 and 2010 (Fig. 1 and Table 1). Whereas the 
uncertainty in dC/dt has decreased over time, owing to the addition of 
network sites monitoring global atmospheric CO , the uncertainty in Fy 
has increased over time, primarily as a result of growing emissions anda 
greater contribution from emerging economies. We find a significant 
negative trend in EN of —0.052 + 0.026 PgC yr 7 (Fig. lcand Table 1), 
indicating a large increase in net global C uptake during the past 50 yr. 
Net global C uptake has grown on average by 0.5 PgC yr per decade, 
from 2.4 + 0.8 PgC yr‘ in 1960 to 5.0 + 0.9PgC yr * in 2010. 

Superimposed on this increasing trend in net global C uptake is 
considerable variability (Fig. 1c). Although net global C uptake 
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increased steadily from 1960 to about 1990, substantial oscillations 
in net global C uptake have occurred over the past 20 yr. In fact, an 
increasing trend in EN of 0.21 + 0.10 PgCyr * was observed during 
the 1990s, but this was followed by an equally large decreasing trend 
in IN of —0.19+0.08PgCyr ~ since 2000. Thus, it might be 
inferred that global C sinks diminished during the 1990s; however, 
this apparent trend is mainly due to the timing of the eruption of 
Mt Pinatubo in 1991, which enhanced net global C uptake, and the 
strong El Nifo event in 1998, which diminished net global C uptake". 

A commonly used diagnostic for detecting changes in the relative C 
sink efficiency is the airborne fraction, AF. Our analysis reveals that 
trends in AF are highly sensitive to whether land-use emissions are 
included in the global C budget. When only fossil fuel emissions are 
included, there is a significant decreasing trend in AF, indicating an 
increase in uptake efficiency. In contrast, when both land-use and 
fossil fuel emissions are included, the sign of the rate of change of 
AF switches and the uncertainty range is larger, including both positive 
and negative trends (Table 1). There has been considerable debate as to 
whether AF has changed over time and what changes in AF indi- 
cate’*!*?!?_ Our results show that when land-use emissions are 
included, there is no detectable change in AF over the last 50 yr. Our 
findings are corroborated by a recent independent analysis showing no 
significant change in AF since 1850’*. Moreover, it has been demon- 
strated that large changes in uptake efficiency are required to alter AF 
significantly’*. Thus, changes in AF over time are highly sensitive to 
land-use emissions and are difficult to interpret, whereas the signifi- 
cant trend in XN provides unequivocal evidence that net global CO, 
uptake continues to increase. 

Alternatively, we investigate where anthropogenic emissions have 
accumulated between 1959 and 2010. Approximately 60 PgC from land 
use and 290 PgC from fossil fuels have been emitted to the atmosphere, 
making a total of 350 + 29 PgC of anthropogenic emissions during that 
time frame (Fig. 2a). Of these, 158 + 2 PgC remain in the atmosphere 
and 192 + 29PgC have accumulated in combined land and ocean 
reservoirs. Thus, 55% of anthropogenic CO, emissions have been 
transferred to the land and oceans, and 45% have remained in the 
atmosphere. The mean decadal C accumulation rate (Fig. 2b) in land 
and oceans has increased every decade, from 2.5 + 1.0 PgC yr’ * during 
the 1960s to 4.6+0.7PgCyr ' since 2000. In the atmosphere, the 
average decadal accumulation has increased from 1.8 + 0.12 PgC yr ' 
during the 1960s to 4.1+0.06PgC yr’ between 2000 and 2010. 
Although the 1990s seem to be anomalous, in that a much greater 
proportion of C accumulated on land and in the oceans than in the 
atmosphere, since 2000 the rate of accumulation in the atmosphere has 
accelerated. 

Because the trend in XN shows an increase in net global C uptake, 
N, and No cannot both be decreasing. If regional ocean and land C 
sinks are indeed diminishing®’°, then to satisfy the global C mass 
balance, these reduced sinks must be more than compensated for by 
an increase in the rate of uptake by existing C sinks or the formation of 
new C sinks. A global inventory of C dynamics in established forests 
has identified strong regional differences in uptake but a fairly constant 
global average uptake rate of approximately 2.5 PgC yr‘ over the past 


Table 1 | Trend analyses of parameters and diagnostics of the global C budget (equation (1)) from 1959 to 2010 


Parameters and diagnostics Slope trend* (PgC yr~) 


95% confidence interval (PgC yr~*) 


dCc/dt 0.054 
Fr 0.115 
Fy. —0.007 
AF with Fr only —0.0016 
AF with Fr and F. 0.0012 
SON with Fr only —0.063 
SEN with Fe and F, —0.052 


Minimum Maximum 
0.043 0.065 
0.103 0.126 

—0.041 0.027 

-0.0029 —0.0002 

—0.0008 0.0032 

-0.076 -0.051 

—0.077 -—0.026 


The mean values of slope trend and their 95% confidence interval are calculated from the distribution of simulations (Methods). Significant trends are in bold. 


* AF is dimensionless. 
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20 yr (ref. 4). Similarly, trends in the partial pressure of CO, in the 
ocean indicate a decrease in C uptake in the Atlantic and an increase in 
C uptake in the Pacific over the past 30 yr (ref. 10). Thus, evidence for a 
change in the rate of C uptake by existing regional C sinks on land and 
in the ocean is equivocal. It has been suggested that widespread 
drought in the Southern Hemisphere has led to a decrease in terrestrial 
CO, uptake® and that increased surface wind velocity has led to 
decreased CO, uptake in the Southern Ocean*. Unfortunately, the 
atmospheric CO, observations required to validate these reported 
declines in Southern Hemisphere CO, uptake remain scarce. 

From a global mass balance perspective, net uptake of atmospheric 
CO, has continued to increase during the past 50 yr and seems to 
remain strong. Although present predictions indicate diminished C 
uptake by the land and oceans in the coming century, with potentially 
serious consequences for the global climate, as of 2010 there is no 
empirical evidence that C uptake has started to diminish on the global 
scale. Therefore, to improve our understanding of carbon-climate 
interactions, more process studies focusing on mechanisms and 
regions of increased net CO, uptake are required, uncertainty in the 
global C budget must be reduced by better constraining estimates of 
fossil fuel emissions, and the global network monitoring atmospheric 
CO, must be expanded to include regions where C uptake is sensitive 
to climate variability. A fully comprehensive and credible global 
carbon budget can be achieved only when regional process studies 
are confirmed by global-scale observations. 


METHODS SUMMARY 


We use a Monte Carlo approach to assess uncertainty because it permits us to 
simulate the time-dependent autocorrelation structure of the uncertainty. From 
1980 to 2010, the global atmospheric CO, growth rate (dC/dt) is calculated from 
annual differences in mean concentration from an array of selected marine boundary 
layer sites. To estimate the uncertainty in dC/dt due to site selection, we construct 100 
bootstrap simulations of alternative observing networks. The pre-1980 value of dC/dt 
is calculated as the annual difference in mean concentration at Mauna Loa and the 
South Pole, with an error structure derived from comparison with the contemporary 
marine boundary layer network extended back to 1959 (Methods). 

For Fp, we use emission estimates from three global inventories—the 
Carbon Dioxide Information Analysis Center, BP and the Emissions Database 
for Global Atmospheric Research—to incorporate possible biases that may 
persist through the entire record, such as energy-to-carbon conversion factors. 
Each emission inventory is divided into two groups, developed nations (members 
of the Organization for Economic Co-operation and Development) and 
developing nations (non-members). The 2¢ error for Fy is then estimated as 5% 
of emissions for all developed nations and 10% of emissions for all developing 
nations”’. 

For F,, we use three independent inventories derived from model simulations of 
land-use and forest statistics'”'*'°, and assign a 2¢ error of 50% to each F, inventory. 
Because F;, and F, inventory errors do not vary randomly from year to year, we 
simulate the uncertainty by generating 500 time series of autocorrelated errors 
(persistence of ~20 yr) for each inventory. For the case with only Fp we combine 
the three Fp inventories, and for the case with both Fp and F;, we combine each of our 
F, inventories with each of our F, inventories into a 3 X 3 matrix of emission 
scenarios, for a total of 4,500 =F emission simulations (N = 3 X 3 X 500 = 4,500). 
Values of AF and XN were calculated by combining each simulation of ZF with a 
randomly chosen simulation of dC/dt. Unless otherwise noted, all uncertainty ranges 
correspond to 2c. 
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METHODS 

The global annual growth rates (dC/dt) and uncertainties for 1980 to 2010 were 
calculated for approximately 40 marine boundary layer (MBL) sites from the 
NOAA/ESRL flask network (http://www.esrl.noaa.gov/gmd/ccgg/). These sites are 
called ‘background’ sites because they provide access to well-mixed air that is not 
significantly influenced by nearby sources and sinks of CO. Thus, they have low 
noise and are representative of large upwind areas. Global averages representative of 
the MBL were calculated following the method in ref. 23. Annual growth rates were 
calculated by subtracting mean values of December and January (MDJ) from MDJ 
values of the following year. The uncertainty in the annual growth rate is 
dominated by having only 40 sites, each of which may have temporal gaps in its 
record. We use a bootstrap method to simulate this uncertainty, by repeating the 
above procedure 100 times. For each realization of a network, 40 sites were 
randomly selected with replacement from the actual sites, so that some sites are 
missing whereas others are represented more than once, but always with at least 
one Arctic, one tropical, one Antarctic, one North Atlantic and one North Pacific 
site selected. On average, the annual growth rate uncertainty (2¢) is 0.38 PgC yr_*. 
Analysis of the bootstrap results revealed modest positive autocorrelation 
coefficients for MDJ errors, of 0.244 and 0.086 for lags of 1 yr and 2 yr, respectively, 
and strong negative autocorrelation coefficients for dC/dt errors, of —0.413, 
—0.166 and —0.085 for lags of 1 yr, 2 yr and 3 yr, respectively. An MDJ value that 
is too high tends to produce an estimate of dC/dt that is too high for the preceding 
year and too low for the following year. For the period before 1980, global MDJ 
values were calculated from the average MDJ of Mauna Loa and South Pole 
(MLOSPO), with correction for a bias relative to the global MBL mean (see below), 
and added autocorrelated noise x;y = b[0.244x(—1) + 0.086x(—2) + Ey]. Here t 
denotes the time in years, 0.244 and 0.086 are the 1-yr and 2-yr lag autocorrelation 
coefficients from above, ¢ is normally distributed random noise and b is a constant 
to normalize x so as to have a standard deviation of 0.24p.p.m. This standard 
deviation is based on the comparison of MLOSPO and the global mean MDJ values 
for the overlapping period, 1980-2010. The monthly mean MLOSPO data before 
1974 are from the Scripps Institution of Oceanography™*. The annual growth rates 
before 1980 are also determined as the differences between successive MDJ values, 
leading to a 2¢ uncertainty of 0.83 PgC yr ' for those years. 

The uncertainty in the observed decadal average growth rate is due to the 
uncertainty in the global MDJ values at the beginning and end of the decade, 
and also to the uncertainty related to potential drift over the 10-yr period of the 
reference gas calibration scale (www.esrl.noaa.gov/gmd/ccl/). The latter is 
negligible compared with the MDJ values. Since 1990 the uncertainty in the 
decadal average annual growth rate has been 0.07 PgC yr‘, and before 1980 it 
was 0.12PgCyr~'. The uncertainty in the observed cumulative CO, increase 
during 1959-2010 is due to the sampling uncertainty of MDJ values in 1959 and 
MD] values in 2011, and to potential changes of the measurement calibration. A 
comparison between MLOSPO and the MBL average during 1980-2010 shows 
that MLOSPO is biased low by an amount that depends on the global rate of fossil 
fuel emissions (in parts per million, MLOSPO — MBL = —0.035 — 0.072F;), with 
an error of ~0.3 p.p.m. in 1959. All MLOSPO values before 1980 were bias- 
corrected. The Scripps calibration scale, which was used for the period 1958- 
1995, may have been different from the current World Meteorological 
Organization scale by as much as 0.3 p.p.m., and pressure-broadening corrections 
of instruments* contributes an additional uncertainty of 0.2p.p.m. to the 
calibration. Thus, the uncertainty in the observed cumulative increase during 
1959-2010 is probably less than 2 PgC. 

Atmospheric CO, concentrations are converted from parts per million to 
petagrams of carbon by using the conversion factor 2.124PgCp.p.m. |. This 
conversion factor implicitly assumes that the annual increases we calculate for 
the MBL are representative of the entire atmosphere. This assumption may lead to 
biases in our analysis for two reasons. First, in the continental boundary layer 
(CBL) CO, concentrations tend to be several parts per million higher on average 
because almost all fossil fuel burning takes place on the continents, despite net 
uptake by the terrestrial biosphere. Second, mean annual CO, concentrations tend 
to be slightly (~0.2 p.p.m.) lower in the free troposphere than in the MBL of the 
Northern Hemisphere and are certainly lower in the stratosphere, where the 
ongoing CO, increase lags the MBL. For a mass-averaged lag of 1.5 yr (ref. 26) 
of the global stratosphere above 200 mbar (~20% of the atmosphere) and an 
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annual growth rate of 2.0p.p.m., the stratosphere would be lower than the 
troposphere by 1.5 X 2.0 = 3 p.p.m. Assuming that the MBL can represent the full 
column produces a high bias of 0.6 p.p.m. (20% X 3 p.p.m.), and one proportionally 
less when the growth rate was lower. Analysis of the observationally constrained 
CarbonTracker global mole fraction fields (http://www.esrl.noaa.gov/gmd/ccgg/ 
carbontracker/) show CBL zonal mean CO, enhancements of 2-4 p.p.m. over the 
MBL, resulting in global MBL underestimation of surface CO, concentrations of 
0.6p.p.m. For 2003, CarbonTracker estimates a global MBL average of 
374.92 p.p.m. and a whole-atmosphere average of 374.96 p.p.m., suggesting that 
the CBL — MBL and troposphere-stratosphere biases, both of which are produced 
by fossil fuel CO emissions, approximately cancel one another. Moreover, because 
we are dealing here with trends, the absolute values of the bias do not matter as 
much as their trends. 

For Fr, we used values calculated from global emission inventories obtained 
from the Carbon Dioxide Information Analysis Center'* (CDIAC) for 1959-2010; 
BP’ for 1965-2010, augmented by CO) from cement production; and EDGAR’® 
for 1970-2010. Data before 1965 and 1970, respectively, were back-filled using 
CDIAC, and EDGAR inventories have been extended from 2007 to 2010 using 
energy statistics from BP. The 2 error for Fp was estimated as 5% of emissions for 
all nations from the Organization for Economic Cooperation and Development 
(OECD) and 10% of emissions for all non-OECD nations”. These errors do not 
vary randomly from year to year. These errors persist for successive years as 
inventory accounting procedures remain the same, but whenever procedures 
change retroactive step revisions are introduced over many years. To address this 
uncertainty, 500 realizations of F; were created from each of the three inventories, 
multiplying the original emissions estimates by a time-dependent autoregressive 
error factor of 1 + cy), where yy = lag 1 X yuy—1) + &, t denotes time in years, 
lag 1 is the autoregressive coefficient for the previous year’s value, ¢ is normally 
distributed random noise and c is a constant factor to normalize the resulting 2o 
errors to 5 or 10%. We chose lag 1 = 0.95, so that the ‘memory’ of errors is ~20 yr. 
As a sensitivity experiment, we increased the uncertainty in Fp emissions to 10% 
for OECD nations and 20% for non-OECD nations. Although this increase in 
uncertainty yielded a wider distribution of trends in XN, more than 95% of 
all trends in XN were still negative, indicating that the significant trends in UN 
are robust. We note that if we had assumed that errors for individual countries 
are independent (instead of grouping them into OECD and non-OECD), the 
uncertainty estimate of the global emissions would be smaller. In reality, there 
is a lot of communication between countries about their emission accounting 
procedures. By including autocorrelation in our Monte Carlo simulations, errors 
are allowed to change slowly over time, so that the relative errors in cumulative 
emissions are smaller than the stated 5 or 10% annual uncertainties. Errors 20 yr 
apart can be of opposite sign and may thus partly cancel each other. Errors that 
persist over the entire record are considered by using three different fossil fuel 
emission inventories. 

For F,, estimates, we used three different inventories of fossil fuel emissions from 
land-use change. We used the well-established and updated accounting methods 
in ref. 27, but note that in a recent reanalysis of tropical deforestation rates, 
emission estimates since 2000 have been revised downward"’. We also used two 
independent inventories of F, emissions derived from temporal maps of land-use 
change combined with climate model simulations'*’. To account for the serial 
correlation of errors in Fi, we used the same autoregressive error structure as 
described previously for Fp, except they were normalized to 2¢ errors of 50%. 
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Persistent near-tropical warmth on the Antarctic 
continent during the early Eocene epoch 
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The warmest global climates of the past 65 million years occurred 
during the early Eocene epoch (about 55 to 48 million years ago), 
when the Equator-to-pole temperature gradients were much smaller 
than today’” and atmospheric carbon dioxide levels were in excess of 
one thousand parts per million by volume**. Recently the early 
Eocene has received considerable interest because it may provide 
insight into the response of Earth’s climate and biosphere to the 
high atmospheric carbon dioxide levels that are expected in the near 
future’ as a consequence of unabated anthropogenic carbon emis- 
sions**. Climatic conditions of the early Eocene ‘greenhouse world’, 
however, are poorly constrained in critical regions, particularly 
Antarctica. Here we present a well-dated record of early Eocene 
climate on Antarctica from an ocean sediment core recovered off 
the Wilkes Land coast of East Antarctica. The information from 
biotic climate proxies (pollen and spores) and independent organic 
geochemical climate proxies (indices based on branched tetraether 
lipids) yields quantitative, seasonal temperature reconstructions 
for the early Eocene greenhouse world on Antarctica. We show that 
the climate in lowland settings along the Wilkes Land coast (at a 
palaeolatitude of about 70° south) supported the growth of highly 
diverse, near-tropical forests characterized by mesothermal to 
megathermal floral elements including palms and Bombacoideae. 
Notably, winters were extremely mild (warmer than 10°C) 
and essentially frost-free despite polar darkness, which provides a 
critical new constraint for the validation of climate models and for 
understanding the response of high-latitude terrestrial ecosystems 
to increased carbon dioxide forcing. 

The climate and ecosystem evolution on Antarctica before the onset of 
continental-scale glaciation at the Eocene/Oligocene transition 
(~33.9 Myr ago) is still poorly resolved owing to the obliteration or 
coverage of potential archives by the Antarctic ice sheet. Available data 
are primarily based on records from the Antarctic Peninsula, which are 
only partly representative of climate and ecosystem conditions on the 
Antarctic mainland’. Terrestrial proxy data generally indicate cool tem- 
perate conditions supporting a vegetation dominated by podocarpaceous 
conifers during the Palaeocene epoch (~65-56 Myr ago) and southern 
beech (Nothofagus) during the middle Eocene epoch (~49-37 Myr ago), 
followed by the final demise of angiosperm-dominated woodlands as a 
result of Cenozoic cooling and the development of the Antarctic cryo- 
sphere around Eocene/Oligocene boundary times*?°. This virtually 
makes the terrestrial realm of the high southern latitudes a climatic terra 
incognita for the peak warming phase of the Cenozoic greenhouse world. 


We apply terrestrial palynology and palaeothermometry based on 
the methylation index of branched tetraethers (MBT) and the cycliza- 
tion ratio of branched tetraethers (CBT) to a new sedimentary record 
from the Wilkes Land margin, East Antarctica, recovered by the 
Integrated Ocean Drilling Program (IODP Expedition 318 Site 
U1356; see ref. 11 and Fig. 1). These data sets provide the framework 
for a terrestrial climate reconstruction for the early Eocene of 
Antarctica. The record presented here comprises a succession of 
mid-shelfal sediments with excellent chronostratigraphic control 
(Supplementary Fig. 1), representing early Eocene (53.6-51.9 Myr 
ago) greenhouse conditions and, separated by a ~2 Myr hiatus, an 
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Figure 1 | Site location and continental setting of Antarctica during early 
Eocene times. Pre-glacial topographical reconstruction for Antarctica during 
Eocene-—Oligocene times. Reconstructed elevations are used here to define 
minimum elevations for the early Eocene (Supplementary Information). The 
reconstruction indicates the likely presence of extensive lowlands along the 
Wilkes Land margin and higher-altitude settings in the hinterland, both of 
which represent the main catchment area for the terrestrial climate proxies 
(sporomorphs and biomarkers) studied at Site U1356. Palaeotopography after 
ref. 29; early Eocene coordinates obtained from the Ocean Drilling 
Stratigraphic Network after ref. 30. 
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interval of cooling presumed within the latest early Eocene to middle 
Eocene (49.3-46 Myr ago; here informally referred to as the ‘mid- 
Eocene’). Palynological and geochemical evidence independently 
supports the contention that the Wilkes Land sector of Antarctica is 
indeed the source region for the Eocene terrestrial palynomorphs 
and biomarkers present in the sediment core from Site U1356 
(Supplementary Information). 

Non-metric multidimensional scaling techniques show that the 
Eocene sporomorph assemblages at Site U1356 represent two main 
biomes (Fig. 2 and Supplementary Information). A highly diverse para- 
tropical rainforest biome prevailed during the early Eocene, probably 
occupying the coastal lowlands of the Wilkes Land margin. This biome 
includes numerous mesothermal to megathermal taxa characteristic of 
modern subtropical to tropical settings in Australia, New Guinea and 
New Caledonia’. In addition to ferns and tree ferns (Lygodium, 
Cyatheaceae), it is characterized by the presence of palms (Arecaceae), 
Bombacoideae (Malvaceae), Strasburgeria (Strasburgeriaceae), Beauprea 
(Proteaceae), Anacolosa (Olacaceae) and Spathiphyllum (Araceae) 
(Fig. 2). Although these additional taxa occur only in low abundance, 
their presence is highly significant. Because they are pollinated by insects, 
their pollen dispersal in extant rainforests is generally restricted to less 
than 100 m (ref. 13). Hence, even low percentages of their pollen in the 
Site U1356 record indicate that these plants formed a substantial part of 
the Wilkes Land margin vegetation. 

The palm and Bombacoideae pollen not only represent the southern- 
most documented occurrences for both taxa during the Eocene, but, 
importantly, imply that winter temperatures remained substantially 
above freezing. Extant palms occur naturally only in regions with a 
coldest-month mean temperature (CMMT) of =5 °C (ref. 1). Because 
their cold-season temperature requirements increase further when 
palms grow under a high partial pressure of atmospheric CO), the 
CMMT implied by palms during the early Eocene greenhouse world 
was at least 8 °C (ref. 14). Even warmer conditions are suggested by the 
record of Bombacoideae, which today occur where CMMT > 10°C. 
Because even the most winter-hardy extant palms are severely 


damaged by short-term freezing, with a series of consecutive years of 
unfavourable climate eventually being lethal'*, winters must have been 
essentially frost-free. 

Sporomorphs representing a lower-diversity temperate rainforest 
biome, with taxa characteristic of extant forests in montane settings 
of Australia, New Caledonia, New Guinea and New Zealand’’, 
typically account for ~30% of sporomorphs during the early Eocene. 
Characteristic taxa include Nothofagus (fusca type), Araucariaceae, 
Proteaceae and Podocarpus; mesothermal to megathermal, frost- 
sensitive taxa are consistently absent. Judging from its floral composi- 
tion, this temperate rainforest biome occupied cooler environments of 
Wilkes Land located farther inland and/or at higher elevations, and 
therefore provides insight into the climate conditions deeper within 
the Antarctic continent. The coeval existence of a temperate rainforest 
biome in the hinterland and a paratropical rainforest in the lowlands of 
the Wilkes Land margin indicates a pronounced continental interior- 
to-coastal temperature gradient during the early Eocene. 

A markedly different vegetation pattern is documented for the mid- 
Eocene time interval, with a strong expansion of the Nothofagus- 
dominated temperate rainforest biome and the near-extirpation of 
the paratropical rainforest biome; notably, the remainder of the latter 
biome is devoid of megathermal elements (Fig. 2). Hence, our data 
suggest that the temperate rainforest biome became dominant over the 
entire catchment area of Site U1356, also extending into the coastal 
regions, and that relict mesothermal components of the paratropical 
rainforest biome persisted only in localized pockets along the Wilkes 
Land margin. These shifts in dominance and floral composition indi- 
cate a strong cooling, which in light of the cold-season sensitivity of 
meso- and megathermal taxa was particularly pronounced in winter 
temperatures, and a strong weakening of the temperature gradient 
between coastal and montane regions of the Wilkes Land margin. 

To quantify further the sporomorph-derived palaeoclimatic 
information, we carried out bioclimatic analyses using the nearest 
living relative concept’® to reconstruct the mean annual temperature 
(MAT), the mean winter and summer temperatures (MWT and MST) 
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Figure 2 | Data from Site U1356 for the early Eocene to mid-Eocene. a, Core 
recovery. m.b.s.f., metres below sea floor. b, Geological age’’. c, Relative 
abundances of selected sporomorphs representative of the paratropical and 
temperate rainforest biomes. d, Relative abundances of Proteaceae pollen. Data 
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based on samples with counts of =90 specimens. e, Number of sporomorph 
species rarefied at 280 grains. The number of sporomorph species from the 
early Eocene is significantly higher than that from the mid-Eocene (Mann- 
Whitney test, P< 0.000005). 
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Figure 3 | Climate reconstruction for the Wilkes Land sector of Antarctica 
during the early and mid-Eocene derived from Site U1356. a, Core recovery. 
b, Geological age'’. c, Relative abundances of sporomorphs representing the 
temperate and paratropical rainforest biomes. d, Estimates of MAT, MWT and 
MST for the temperate (blue) and paratropical (red) rainforest biomes, based 
on the methodology of ref. 16. Error bars represent the minimum and 
maximum estimates determined using that method. The vertical dashed line 


(Fig. 3), and the mean annual precipitation (Supplementary Fig. 5). 
These results were critically assessed through a comparison with 
reconstructions using a different methodology that also relies on the 
nearest living relative concept (the coexistence approach of ref. 17; 
see Supplementary Information). Because the two recognized biomes 
represent distinct environments with different climatic conditions, our 
approach allows a spatiotemporally differentiated view of the climate 
evolution of Wilkes Land from early Eocene peak warmth through the 
onset of mid-Eocene cooling. Our temperature estimates for the 
paratropical rainforest biome show that climate along the Wilkes 
Land margin was generally warm until at least 51.9 Myr ago. Most 
samples indicate temperatures of 16 +5°C for MAT, 11+5°C for 
MWT and 21+5°C for MST, although a small number also yield 
colder or warmer values (Fig. 3). A markedly cooler climate emerges 
for the temperate rainforest biome, in particular for MAT and MWT, 
for which most samples yield values of 9 + 3 °C and 5 + 2 °C, respect- 
ively. For MST, the data show a strong scatter between 14 + 1 °C and 
18 + 3 °C, and the values overlap partly with those for the paratropical 
rainforest biome. For both biomes, the mean annual precipitation was 
persistently more than 100cm yr_' (Supplementary Information). 
For the mid-Eocene interval, our reconstructions based on the relicts 
of the paratropical rainforest biome suggest a pronounced cooling, 
although this trend is partly within the error limits of the data. The 
estimated MAT is 14 + 3 °C, which represents a decline of ~2 °C from 
the early Eocene. Our data also indicate a decline in MWT and MST, 
although these trends are again within the error limits. Temperatures 
reconstructed for the temperate rainforest biome are comparable to 
those from the early Eocene, which is consistent with there being no 
major changes in the composition of this biome between both intervals. 
Independent support for a warm terrestrial climate during the early 
Eocene and marked cooling during the mid-Eocene comes from our 
MBT/CBT palaeothermometry data (Fig. 3). Soil temperatures of 
~24-27 °C are estimated for the early Eocene, and ~17-20°C for 
the mid-Eocene. These temperatures fall close to the MSTs derived 
for the paratropical rainforest biome. This suggests that the branched 
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marks the minimum requirements of Bombacoideae for the mean temperature 
of the coldest month. e, Temperatures derived from the MBT/CBT index, with 
horizontal error bars indicating the calibration standard error (+5 °C). This 
error refers to absolute temperature estimates across all environmental settings 
of the modern calibration; thus, the error of the within-record variation is much 
smaller. Relative sporomorph abundances and sporomorph-based climate 
estimates are based on samples with counts of =90 specimens. 


tetraethers in Site U1356 sediments originated from coastal lowland 
soils of the Wilkes Land sector of Antarctica and could imply a bias 
of the MBT/CBT proxy towards summer temperatures, although 
such a bias has not been observed in modern mid-latitude climates 
(Supplementary Information). 

Our data, which provide continental temperature reconstructions 
for the high southern latitudes during the early Eocene greenhouse 
world, show that paratropical conditions persisted in the lowlands of 
the Wilkes Land margin of Antarctica from at least 53.9 to 51.9 Myr ago. 
Notably, our estimates yield a constraint on Antarctic winter tempera- 
tures during peak greenhouse conditions. The CMMT and MWT 
estimates of =10 °C and 11 + 5 °C, respectively, compare favourably 
with deep-water temperatures of ~11°C in the marine realm at this 
time®’*. Because early Eocene deep waters were sourced from 
downwelling surface waters in the high southern latitudes off 
Antarctica’, winter temperatures in these regions cannot have dropped 
much below 11 °C. Although our MWT estimates are not representative 
of the Antarctic continent as a whole, they bear implications for the 
current debates on the general ability of climate models to reproduce 
extreme greenhouse conditions and the response of polar ecosystems 
to increased CO, forcing. 

When run with conservative estimates of atmospheric CO; levels for 
the early Eocene, fully coupled climate models yield high-latitude 
terrestrial winter temperatures considerably below freezing”, and they 
produce warm (that is, above-freezing) winters in the terrestrial high 
latitudes only when radiative forcing is strongly enhanced*'. Hence, 
our winter temperatures for Wilkes Land provide a critical reference 
point for understanding the climate dynamics of the early Eocene 
greenhouse world. They are in remarkably close agreement with simu- 
lated MWTs for the Wilkes Land region when radiative forcings equi- 
valent to 2,240 p.p.m.v. and 4,480 p.p.m.v. CO, are applied (ref. 21), 
suggesting that enhancing radiative forcing in models may help resolve 
the persistent data-model mismatch. However, factors other than extre- 
mely high atmospheric greenhouse gas forcing may have contributed to 
the winter warmth along the Wilkes Land sector of Antarctica. They 
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include winter cloud radiative forcing over high-latitude land masses”, 
possibly connected to high ocean-to-land moisture transport’. Our 
precipitation estimates (Supplementary Fig. 5) and the presence of 
rainforest biomes consistently suggest high moisture availability 
throughout the year, thus lending support for this mechanism being 
in operation in the Wilkes Land sector of Antarctica. A high moisture 
flux from the ocean was facilitated by the presence of extremely warm 
surface waters in the Australo-Antarctic gulf, resulting from the sub- 
tropically derived, clockwise-flowing proto-Leeuwin current**. Warm 
surface waters off Wilkes Land are documented by mass occurrences of 
the subtropical dinoflagellate cyst Apectodinium". 

Our data also provide new insights into the physiological ecology of 
high-latitude forests, which are subject to seasonally extreme changes 
in light levels. The ~50 days of polar darkness on the Wilkes Land 
margin poses severe constraints on the plants’ carbon gain by pho- 
tosynthesis and carbon loss by respiration. Because carbon loss by 
respiration typically increases with temperature’, it has been argued 
that polar winters must have been cool rather than warm*’. Our MBT/ 
CBT temperature data, which under the most conservative (that is, 
‘coldest’) assumption represent MST, are typically between 24 and 
27°C for the early Eocene (Fig. 3). They are similar to, although 
possibly slightly warmer than, the terrestrial MST predicted by climate 
models using high radiative forcing (20-25 °C; ref. 21). Over a wide 
range of CO) forcing, the models yield a temperature seasonality of the 
order of 10°C, thus suggesting a MWT of 10-15 °C. This evidence, 
which is strictly independent of our vegetation-based climate recon- 
structions, contradicts the scenario of cold winters on Wilkes Land, 
therefore suggesting that respiration losses under a highly seasonal 
polar light regime were compensated for by a factor other than tem- 
perature. We suggest that the high atmospheric CO, levels of the early 
Eocene greenhouse climate were a decisive factor in the physiological 
ecology of high-latitude forests, most probably through causing a 
reduction in carbon respiration during the polar winter”” and an 
increase in photosynthetic carbon gain during the growing season”. 

Our new data from the peak early Eocene greenhouse world indicate 
that a highly diverse forest vegetation containing evergreen elements 
can successfully colonize high-latitude, warm winter environments 
when atmospheric CO, levels are high. Depending on the thresholds 
in atmospheric CO, required by such plants, the duration of polar 
winters and the temperatures at which such forcing factors become 
significant, these results have important implications for the composi- 
tion of high-latitude terrestrial ecosystems in a future anthropogenic 
greenhouse world with high atmospheric CO, levels and drastic polar 
amplification of warming. 


METHODS SUMMARY 


Palynology. Between 10 and 15 g of sediment was processed per sample. The dried 
sediment was weighed and spiked with Lycopodium spores to facilitate the 
calculation of absolute palynomorph abundances. Chemical processing com- 
prised treatment with 30% HCl and 38% HF for carbonate and silica removal, 
respectively. Ultrasonication was used to disintegrate palynodebris. Residues were 
sieved over a 10-j1m mesh and mounted on microscope slides, which were 
analysed at X200 and X1,000 magnification. A detailed, step-by-step processing 
protocol is given in Supplementary Information. 

Sporomorph-based climate reconstructions. Bioclimatic analyses were carried 
out following ref. 16, but with data sources including Southern Hemisphere taxa, 
allowing the development of climatic profiles for each taxon as described in 
Supplementary Information. The results of the bioclimatic analyses were critically 
assessed through the application of the coexistence approach” to the data set using 
the same underlying database. Supplementary Table 1 lists all taxa that were 
evaluated through the bioclimatic analyses and the coexistence approach, their 
botanical affinity and the nearest living relatives used in the analyses. 

Organic geochemistry. For MBT/CBT analyses, freeze-dried, powdered 
samples were extracted with an accelerated solvent extractor using a 9:1 (v/v) 
dichloromethane (DCM):methanol solvent mixture. The obtained extracts were 
separated over an activated Al,O; column, using 9:1 (v/v) hexane:DCM, 1:1 (v/v) 
hexane: DCM, 1:1 (v/v) ethylacetate,DCM and 1:1 (v/v) DCM:methanol, into 
apolar, ketone, ethylacetate and polar fractions, respectively. The polar fractions 


76 | NATURE | VOL 488 | 2 AUGUST 2012 


containing the branched tetraether lipids were analysed by HPLC/APCI-MS 
(high-performance liquid chromatography/atmospheric pressure chemical 
ionization mass spectrometry) using an Agilent 1100 LC/MSD SL. MBT/CBT 
indices were calculated and converted into temperature estimates as described 
in Supplementary Information. 
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Universal species—area and endemics-—area 
relationships at continental scales 


David Storch’, Petr Keil? & Walter Jetz® 


Despite the broad conceptual and applied relevance of how the 
number of species or endemics changes with area (the species-area 
and endemics-area relationships (SAR and EAR)), our understand- 
ing of universality and pervasiveness of these patterns across taxa 
and regions has remained limited. The SAR has traditionally been 
approximated bya power law’, but recent theories predict a triphasic 
SAR in logarithmic space, characterized by steeper increases in 
species richness at both small and large spatial scales*°. Here we 
uncover such universally upward accelerating SARs for amphibians, 
birds and mammals across the world’s major landmasses. Although 
apparently taxon-specific and continent-specific, all curves collapse 
into one universal function after the area is rescaled by using the 
mean range sizes of taxa within continents. In addition, all EARs 
approximately follow a power law with a slope close to 1, indicating 
that for most spatial scales there is roughly proportional species 
extinction with area loss. These patterns can be predicted by a simu- 
lation model based on the random placement of contiguous ranges 
within a domain. The universality of SARs and EARs after rescaling 
implies that both total and endemic species richness within an area, 
and also their rate of change with area, can be estimated by using 
only the knowledge of mean geographic range size in the region and 
mean species richness at one spatial scale. 

The scale dependence of species richness has implications for all 
biodiversity patterns’. The SAR has been used to extrapolate species 
richness across spatial scales and also to estimate species extinctions 
after habitat loss’* (but see refs 9, 10), typically relying on its particular 
universal properties. However, the universality of the shape of the SAR 
has been questioned'’. The nested SARs (in which smaller sample 
areas are located within larger ones) are classically described as a power 
law across most spatial scales’’, but current theoretical approaches 
predict that species richness first increases steeply with area at a 
decelerating rate, then increases roughly linearly in logarithmic space, 
and accelerates upwards again when sample areas approach the size of 
individual species’ geographic ranges**. In contrast to the well- 
documented curvature over small areas'*™, data availability has so far 
hindered generalizations about the SAR at large scales. The EAR is the 
relationship between the area of a region and the number of species 
restricted (that is, endemic) to it. The EAR provides information on the 
number of species that may go extinct if parts of the area are destroyed 
or transformed*'*'°, because being endemic to the area would imply 
that any local extinction is also global. Despite the potential of the EAR 
in biodiversity science and conservation”'*”’, its empirical shape at 
biogeographic scales has remained largely undocumented. The slope 
of the EAR at smaller spatial scales is expected to be connected to the 
slope of the SAR at large scales, because an increase in species richness 
with increasing study plot area (the SAR) corresponds to a decrease in 
the number of species that are restricted (that is, endemic) to the 
remaining area’® (that is, the area not included in the study plot; see 
Supplementary Discussion and Supplementary Figs 1 and 2). 

Here we provide a construction of fully nested continental SARs and 
EARs for all amphibians, birds and mammals (see refs 18 and 19 for 


data description and validation, and Methods and Supplementary 
Table 1 for details). SARs for all continents and taxa accelerate upward 
in log-log space (Fig. la—c). Differences in the SAR position along the 
yaxis correspond to known differences in total species richness of 
individual continents and taxa”*”’; for example, birds have consistently 
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Figure 1 | SARs and EARs across five continents and three vertebrate 
classes. a—c, e-g, The SARs for amphibians (a), birds (b) and mammals 

(c) reveal an upward-accelerating shape for logarithmic axes, whereas EARs for 
amphibians (e), birds (f) and mammals (g) are more or less linear. 

d, h, Confirmation by plotting the local slopes (derivatives) of the relationships 
for each continent (d for SARs and h for EARs). All relationships were 
constructed by using a strictly nested quadrat design. Grey lines correspond toa 
power law with a slope of 1; that is, proportionality between area and the 
number of species. S is the mean number of species, E is the mean number of 
endemics, and A is the area in km?. 
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Table 1 | Slopes of the EARs and SARs calculated by using the nested quadrat design 


Taxon Continent EAR slope SAR slope (lower half) SAR slope (upper half) 
Birds Eurasia 1.92 (1.70-2.06) 0.21 (0.19-0.22) 0.41 (0.37-0.45) 
Africa 1.39 (1.22-1.65) 0.21 (0.19-0.23) 0.40 (0.34-0.45) 
N. America 1.25 (1.00-1.61) 0.19 (0.17-0.20) 0.29 (0.27-0.32) 
S. America 1.30 (1.15-1.44) 0.17 (0.16-0.19) 0.39 (0.35-0.43) 
Australia 1.96 (1.20-2.67) 0.15 (0.12-0.17) na. 
Mammals Eurasia 1.36 (1.26-1.43) 0.26 (0.24-0.27) 0.48 (0.44-0.52) 
Africa 1.24 (1.15-1.39) 0.23 (0.21-0.25) 0.48 (0.42-0.54) 
N. America 1.26 (1.11-1.43) 0.21 (0.19-0.22) 0.40 (0.34-0.45) 
S. America 1.21 (1.10-1.36) 0.17 (0.16-0.18) 0.44 (0.41-0.46) 
Australia 1.46 (1.09-1.85) 0.21 (0.19-0.23) n.a. 
Amphibians Eurasia 1.15 (1.05-1.31) 0.35 (0.31-0.40) 0.70 (0.54-0.86) 
Africa 1.13 (1.05-1.25) 0.29 (0.26-0.32) 0.64 (0.53-0.75) 
N. America 1.00 (0.84-1.16) 0.26 (0.22-0.31) 0.58 (0.43-0.70) 
S. America 0.93 (0.86-1.02) 0.26 (0.24—0.28) 0.58 (0.49-0.68) 
Australia 1.12 (0.80-1.53) 0.27 (0.23-0.33) n.a. 


The slopes were estimated by using linear regression on logarithms of the mean number of species for each area (also logarithmically transformed). The EAR slopes were calculated across the whole range of areas. 
The SAR slopes were calculated separately for the lower and upper half of the analysed areas (cut-off: logo area = 6.1) to provide measures of both the lower and upper ends of the upward-accelerating SARs. For 
local slope estimates at each area see Fig. 1d. To give a general representation of the possible range of slopes that would be detected if biodiversity data were incomplete, we randomly selected only 10% of all 
possible positions of sampling windows, repeated the procedure 500 times and estimated the lower and upper 95% quantiles of the slopes obtained from the resampled data (see Methods). 


higher species richness for a given area than do mammals, whereas 
amphibians typically show low richness. However, amphibians also 
show much steeper SARs than other taxa (Fig. 1d), and Eurasia has the 
steepest SARs for all taxa. An assessment of local slopes (derivatives) of 
the SARs illustrates the upward-increasing nature of the logarithmic 
SAR (Fig. 1d and Table 1). Considerable differences in this increase 
appear among taxa, most clearly between Eurasian amphibians and 
North American birds. We do not find evidence for the first phase of 
the triphasic SARs, which confirms the expectation that this phase 
occurs only when the number of individuals becomes limited*"; that 
is, at scales considerably finer than those made possible by the current 
grain size of global distribution data’. 

The nonlinear shapes of SARs stand in striking contrast to those 
observed for EARs (Fig. le-g). All continents and taxa show a consist- 
ent and seemingly linear increase in number of endemics with increas- 
ing area in logarithmic space. Local slopes of EARs are reasonably 
invariant with scale, taxon and continent (Fig. 1h), although some show 
a slight increase at areas above 3 X 10° km7. Except for generally steeper 
slopes in birds, EAR slopes tend to vary between 0.75 and 1.5 and are 
often close to 1 (Fig. 1h and Table 1), indicating that the number of 
endemic species increases more or less proportionally with area. 

The increasing slope of the SAR at large spatial scales is predicted to 
be associated with increasing species spatial turnover as sample areas 
approach the sizes of species’ geographic ranges” * (see Supplementary 
Discussion). We contend that the SAR curvature may thus be depend- 
ent on an ‘effective’ range size equal to the mean range size. Because of 
its similar foundation, we predict that EARs will show similar range 
size dependence. We therefore rescaled all area axes such that one areal 
unit corresponds to the mean species geographic range for a given 
continent and taxon: 


A, =A/Rrc (1) 


where A, is the rescaled area, A is the area of the study plot and R,., is the 
mean range size for taxon t and continent c. In addition we rescaled the 
vertical axis to represent species richness proportional to the richness of 
an area equal to R; ¢: 


S,=Sa/Sq— (2) 


R(t,c) 


and 


BE, =E4/Eg5 (3) 


where S, and E, are the rescaled counts of species and endemics, S, and 
E,4 are mean counts for a given area, and Sitho and Exo are mean 
richness values for the area that equals the mean geographic range size 
of a given taxon and continent. Under this transformation, the original 


SARs and EARs collapsed into an approximately single curve (Fig. 2a, b). 


This collapse was also observed by using an alternative sampling 
design based on continental (self-similar) instead of quadratic shapes 
of study plots (Fig. 2c, d; see Methods). The steeper SARs observed for 
amphibians (Fig. la) can thus be attributed to their considerably 
smaller ranges: the SAR increases rapidly at smaller absolute areas, 
and the slope continues to increase, whereas the other two taxa with 
much larger range sizes never approach a similarly steep relationship 
(see Supplementary Fig. 11). 
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Figure 2 | SARs and EARs after rescaling. a, b, After expressing the area in 
units corresponding to mean range size and standardizing the vertical axis so 
that it represents species richness relative to mean richness for a given unit area, 
all the SARs and EARs collapse into one universal relationship, although some 
deviations exist, particularly in small areas. For EARs (b), birds in Eurasia and 
Australia represent the only considerable deviations. In these regions many 
endemics with small ranges occur at the edge of the continents, whereas the 
areas for which the EARs were calculated were taken predominantly from the 
centre of the continent. c, d, These universal relationships also exist for SARs 
and EARs constructed using the alternative, continental shape design, in which 
sample areas are not quadrats but keep the shape of the given continent (see 
Methods and Supplementary Discussion). Solid black lines refer to rescaled 
SARs and EARs predicted by simulations based on a random placement of 
simplified ranges (model 3; see Fig. 3 and Methods). Solid grey lines all have 
slope of 1. For explanations of A,, S, and E, see equations (1), (2) and (3). 
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To test whether the observed continental relationships and their 
collapse can be predicted on the basis of simple assumptions concern- 
ing spatial distribution of species ranges, we developed several spatially 
explicit simulation models (see Methods). Specifically, we assessed the 
degree to which the observed SARs and EARs may be recovered by 
using placement of species ranges modelled as simple contiguous 
shapes. Although multiple evolutionary and ecological factors ulti- 
mately determine the exact location and sizes of geographic ranges 
(which in turn affect SAR and EAR), we find that just a few assump- 
tions are sufficient to explain the observed scaling relationships 
(Fig. 3). Among our models those with independent (models 3 and 
4; see Fig. 3 and Methods for details), as opposed to clumped (models 1 
and 2), placement of geographic ranges produce SAR and EAR shapes 
as well as collapses of all curves that are essentially identical to those 
observed. The resulting patterns are not particularly sensitive to the 
exact shape of the frequency distribution of range sizes, because model 
3 (which retained the observed range size distributions) provided very 


similar patterns to those of model 4, which did not retain them (see 
Supplementary Discussion). In contrast with models 3 and 4, the 
observed geographic range locations and sizes do show spatial non- 
independence, but much less so than in model 2, and the effect on 
the observed collapse is minimal (Supplementary Figs 13-15 and 
Supplementary Discussion). The empirical patterns (Fig. 2) are 
therefore expected whenever a species distribution is represented by 
more or less independently located contiguous ranges, with the mean 
range size of species being the only biologically relevant variable affect- 
ing the exact properties of the patterns. 

The universality of SARs and EARs after rescaling implies that a 
knowledge of mean species richness (of either endemics or all species) 
at one scale allows the estimation of the whole SARs and EARs with 
only one additional piece of information: mean range size for species in 
the region. Although this information may not usually be available 
without knowledge of the geographic distribution of all species (and 
thus also the whole SARs and EARs), in some cases it may be estimated 
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Figure 3 | Rescaled SARs and EARs predicted by four simulation models of 
range placement. Range sizes were drawn from the empirical frequency 
distributions of each taxon and domain (black areas in Supplementary Fig. 3) 
and were placed into a domain with a size equal to that of the regions analysed 
for the original SARs and EARs using the strictly nested quadrat design 
(Supplementary Fig. 3 and Methods; for the results based on the regions 
analysed using the continent shape design see Supplementary Fig. 12). a, Model 
1 is based on a random placement of square ranges within the domain, 
producing a higher concentration of range midpoints in the centre of the 
domain (mid-domain effect*’). b, Model 2 places all the square ranges in a 
corner of the domain, to illustrate the role of non-random range position. 
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c, Model 3 is based on a random placement of square ranges but minimizes the 
mid-domain effect by allowing model ranges to overlap the domain only partly. 
The observed frequency distribution of range sizes is retained, but resulting 
range shapes within the domain become variable. d, Model 4 is similar to model 
3 and completely avoids the mid-domain effect but does not retain the 
originally observed range size distribution (see Methods for details). We 
produce a fitted line for model 3 results to highlight its match with the empirical 
patterns (see Fig. 2): black lines represent the Lowess regression line for the 
rescaled SAR plot (smoothing span 0.2) and the linear regression line for the 
rescaled EAR plot. Solid grey lines all have slope of 1. 
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reasonably well from similar taxa or representative subtaxa. However, 
the larger deviations from the universal relationship in small areas 
(Fig. 2a) will probably limit the accuracy of richness estimates towards 
smaller spatial scales. This is expected, because the SAR and EAR at 
these scales are determined by complex shapes of individual geo- 
graphic ranges”, which go beyond simple differences in range sizes. 

In contrast to the strongly nonlinear EARs previously reported for 
small spatial scales’, the slope of the continental EARs assessed here 
was generally close to 1. Therefore, for the scales examined, the number 
of species predicted to go extinct is roughly proportional to the area 
destroyed. However, the uncertainty in any such predictions would be 
large (Table 1), and the shape of the EAR is unlikely to hold up at the 
finer spatial scales relevant to conservation’. In addition, the relation- 
ships addressed here comprise only the direct effect of shrinking 
habitable area and not the cascading effects of species interactions or 
other consequences of species loss. Nevertheless, for the scales analysed 
here, the relatively steep slopes of all EARs suggest large extinction 
rates from area loss. 

Spatial biodiversity patterns, including the SAR and EAR, are 
affected by many factors, ranging from the spatial arrangements of 
continental masses and biomes and the patterns of diversification and 
dispersal playing out within and among them, to population dynamics 
and interspecific interactions****. However, all of these processes ulti- 
mately translate into patterns of geographic distribution of individual 
species, which then represent a proximate driver of spatial macro- 
ecological patterns”. We have shown that species range sizes have 
a key role in large-scale upward-accelerating SARs, and consequently 
also EARs at specific spatial scales. Because local SAR slopes are 
mathematically related to species spatial turnover'**°”’, the determi- 
nants of species range sizes predictably determine global patterns in 
species spatial turnover, the SAR and the EAR. These findings suggest 
that an integrated evolutionary and ecological understanding of just a 
few attributes of regional biota can enable far-reaching predictions to 
be made about the scaling of biodiversity. 


METHODS SUMMARY 


We calculated mean species richness across all possible quadrats of given size by 
using the strictly nested quadrat method’*”’. We calculated SARs for continental 
regions able to accommodate quadrats encompassing 20 X 20 grid cells, except for 
Australia where because of the smaller size we used 14 X 14 (Supplementary Fig. 3). 
This avoids potential biases resulting from different species richness in marginal 
areas that could not be sampled by large quadrats (see Supplementary Fig. 4). In a 
second sampling design we adjusted plot boundaries to mimic continental shapes 
(Supplementary Fig. 5), which increased the amount of edge area that could be 
included in regions such as North America with more complicated geometry. This 
procedure yielded qualitatively similar, but noisier, results (see Methods, 
Supplementary Discussion, Supplementary Fig. 6 and Supplementary Table 2 for 
details). We used four simulation models of range placement within a domain to 
examine the effects of range position (random or spatially clumped) and shape 
(constant or varying) on resulting SARs and EARs (see Fig. 3 for more details on the 
models). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Description of the strictly nested quadrat (SNQ) design. We calculated mean 
species richness across all possible quadrats of given size by using the strictly 
nested quadrat (SNQ) method, which is a Type I curve in Scheiner’s** terminology. 
It was implemented by using a moving-window algorithm”*'. SNQ implies 
mutual dependence of species richness at different spatial scales, as the species 
richness of larger areas encompasses all the species of the smaller plots within 
them. In this design, species richness of small areas can thus never be higher than 
the richness of the larger areas within which they sit. The overlapping nature of 
SNQ could be criticized for introducing some pseudoreplication as each point in 
space is sampled repeatedly by many samples of a given area. However, every 
SAR construction method has its limitations, and the SNQ design has several 
advantageous properties: first, it keeps the spatial extent’? and shape” of the 
sampling window identical at all scales; second, it provides the most accurate 
estimate of expected species richness for a randomly located plot of a given area; 
and third, it is the only design in which local slope can be directly related to f 
diversity and patterns of species’ spatial aggregation’. 

We started the SNQ procedure with the largest sampling window, which we 
moved continually across the world grid and counted the number of species 
captured within each window position. For the SAR construction we selected only 
areas that could contain the largest window without also including any sea or 
major water bodies (Supplementary Fig. 3). These are the black areas in 
Supplementary Fig. 3. Very large window sizes enabled us to explore SARs for a 
large range of areas, but only for limited proportion of a continent. In contrast, 
small window sizes fitted a larger proportion of continents, but the resulting SARs 
encompassed only a limited range of areas. We therefore initially set the largest 
window size to all values between 5 X 5 and 35 X 35 grid cells, and subsequently 
chose 20 X 20 grid cells because they were sufficiently representative of both 
continental coverage and range of areas in the SAR. However, for the continent 
of Australia we used 14 X 14 grid cells instead. 

We then reduced the size of the sampling window to 19 X 19 and moved it 
continually within black areas, counting species within each possible window 
position. We repeated this procedure until the size of the sampling window 
reached 1 X 1 grid cell. The mean species richness for a given area (S,4) was 
calculated as 


Sa= Se 


where n is number of all possible positions of the sampling window of area A 
within the black area (Supplementary Fig. 4). 

Description of continent shape (CS) design. We developed a novel and alterna- 
tive SAR construction design, which we call the continent shape (CS) design. It has 
the advantage that, unlike SNQ, it can show SARs that include areas as large as 
whole continents. It also keeps the shape of the sampling window approximately 
constant. The only disadvantage is that it is not strictly nested because the complex 
shapes of sampling windows cannot be placed everywhere within the complex 
shape of a given continent (Supplementary Figs 4 and 5). Therefore the coverage 
(the black area in Supplementary Fig. 3) for different sizes of sampling windows 
can vary, and hence different places within a given continent are not equally 
represented in plots of different areas. 

The CS design works as follows. We first counted the number of species in a 
whole continent (such as Africa). We then multiplied the coordinates of each grid 
cell within Africa by a constant k (0 <k <1) to obtain an approximated smaller 
representation of the African continental shape, which we then used as a moving 
sampling window in a same way as in the SNQ algorithm. We used this approach 
for five major land masses: North America, South America, Africa, Australia and 
Eurasia (Europe and Asia combined). We manually excluded most islands. The 
principle of the algorithm is further described in Supplementary Figs 4 and 5. The 
black area covered by the CS is illustrated in Supplementary Fig. 3. 

Quantifying variation of SARs and EARs. Our results are based on mean values 
of the number of species or endemics for each area. Because we have analysed all 
possible plots on the whole Earth, it is not straightforward to express the variation 
around these ‘mean’ curves. We recognize that it is impossible to use standard 
statistical tools that assume that measured values represent samples from some 
larger universe (population) and to calculate an error of the estimated mean values 
when we have invoked the whole population, not just samples of it. Thus, we only 
calculate characteristics concerning the distribution of values, namely percentiles 
(Supplementary Figs 7-10). It is impossible to use standard regression tools for the 
same reasons, so to estimate some statistics concerning the curves themselves, we 
resampled the values and estimated the possible range of slopes by randomly 
selecting 10% of the possible positions of sampling windows, repeating the 


procedure 500 times and estimating the lower and upper 95% quantiles of the 
slopes obtained from these resampled data (Table 1 and Supplementary Table 2). 
Estimation of Sy5 and Eg;_5- In equation (2) we needed mean species richness 
of an area equal to the mean range size (S;—).The use of a sampling window of 


constant shape in a gridded data poses the Poul that mean range size (Ry,c) is 
mostly not exactly equal to any area (A) of the sampling window. We therefore 
took mean species richnesses S4, and S4. in sampling windows of areas A, and A, 
that were closest to R;,, and satisfied Ay < R;.- <.A2. We then calculated Sao from 
a local power-law approximation of the SAR curve. Scaling exponent of the 
power law was ( log S42 — log S41)/(log A2 — log Ai), and hence Sato = 
exp[log Sa; + (log Ry, — log Aj) x (log S42 — log S41)/(log A2 — log A;)].. The 
same approach was used to calculate E75. 

Models of range placement. We sought to explore how the rescaled SARs and 
EARs can be influenced by the spatial position of species’ ranges (random or 
aggregated), the shape of ranges (uniform or variable) and the range-size fre- 
quency distribution. We developed four models in which square geographic 
ranges of species were placed on an artificial square continent (Fig. 3)—an 
approach similar to that in refs 2 and 29. The sizes of these ranges were drawn 
from the empirical distribution of a given taxon within a given domain. The 
domain is the black area for which SARs and EARs were explored (Fig. 3). Each 
model had a domain of approximately the same size as the domains within the real 
continents (the black areas in Supplementary Fig. 3). We replicated our simula- 
tions by using the domain size and empirical range-size frequency distribution for 
both SNQ design (smaller domains) and CS design (larger domains). We per- 
formed a simulation for each taxon, each continent and each model and plotted 
the results in the form of rescaled SAR and EAR (identically to Fig. 2). The models 
were characterized as follows. 

Model 1: random placement, uniform shape of ranges. This model randomly 
places species’ ranges of uniform (square) shape strictly inside the domain. This 
model incurs a strong mid-domain effect, leading to both higher species richness 
in central areas and relatively higher species richness for larger areas. The reason is 
that for uniform range shapes, large ranges will necessarily reach the central region 
of the domain and will therefore be necessarily sampled by larger sample windows 
(that is, there is no possibility of avoiding them by any position of the sampling 
window). 

Model 2: non-random placement, uniform shape of ranges. This model is very 
similar to model 1. The only difference is that instead of being placed randomly, all 
ranges are placed into one corner of the domain. Although the placement of range 
midpoints is then more balanced with respect to their central-peripheral position, 
this model still leads to the over-representation of large ranges in the central area of 
the domain as a result of the uniform shape of all ranges, including large ones, 
which cannot be avoided by any placement of the sampling window. 

Model 3: random placement, variable shape of ranges. Here randomly placed 
species’ ranges may extend beyond the continental domain. This weakens the mid- 
domain effect (but does not eliminate it completely); however, as a consequence, 
the shapes of ranges that are inside the domain are no longer uniform (they are cut 
by the domain boundary). The algorithm operates as follows. (1) Drawa range size 
from a given distribution of range sizes. (2) Place the range randomly into the 
continent so that it overlaps at least one grid cell within the domain. (3) Calculate 
the species range size (Raomain) as the part of the placed range that lies within the 
domain. (4) Try to find one value in the empirical range-size distribution 
(Rempirical) that is closest to Raomain and at the same time lies in the interval between 
Raomain ~ VRaomain 20d Rgomain + ¥Rdomain- If such a value is found, the species is 
accepted to exist in the domain and the Rempirical Value is eliminated from the 
empirical range size distribution. If an acceptable Rempirical is not found, steps 2 and 
3 are repeated. (5) The procedure is repeated until all values of Rempirical have been 
eliminated from the empirical distribution of range sizes (that is, all species have 
been placed into the domain). This model retains the observed distribution of 
range sizes in the domain very close to the original range size distribution (it very 
well preserves its mean and variance), and it partly eliminates the mid-domain 
effect because large ranges may have various shapes and thus do not necessarily 
reach the central areas. 

Model 4: random placement, variable shape of ranges. As in the previous model, 
this model randomly places species’ ranges at least partly inside the domain, so that 
the areas outside the domain are cut off. However, there is no algorithm that would 
ensure that the resulting distribution of range sizes within the domain is similar to 
the original range size distribution. There is therefore no control over the mean 
and variance of the resulting distribution of range sizes within the domain, but this 
model does completely eliminate any mid-domain effect. 

Spatial patterns in range location and size. We performed an additional analysis 
(Supplementary Figs 13-15) to assess the similarity of spatial patterns of range 
location and size between simulation models and empirical data. Specifically, 
we investigated whether the spatial distribution of empirical ranges resembles a 
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random placement (models 1, 3 and 4) or is closer to the clumped distribution 
produced by model 2. 

To assess the magnitude of randomness in range locations we first calculated the 
density (that is, count or ‘richness’) of species geographic range centroids (centres of 
gravity) for all CS continents and all three taxa. Range centroids were estimated by 
using gridded species’ ranges rather than original range polygons. Centroids falling 
in between two grid cells were assigned randomly to one or the other. We then 
calculated spatial correlograms by plotting Moran’s I of grid cell centroid count 
against geographic distance. We repeated the same procedure for the simulation 
models 1-4. The sizes of the artificial square continents required in these models 
were identical to those of the continents explored using the CS design 
(Supplementary Fig. 3). In these simulations we used the empirical distribution of 
range sizes of each taxon in the continents explored using the CS design. We ran 100 
simulations for each continent and taxon combination (1,500 in total) and calcu- 
lated mean correlograms of the simulations together with 95% confidence intervals. 
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We additionally assessed the spatial randomness of species geographic range 
sizes; that is, whether ranges of similar sizes tend to be clumped together or not. 
We used the same data and simulations as described above (CS design continents, 
100 simulations for each continent, taxon and model) and calculated correlograms 
of range sizes; that is, the autocorrelation of range sizes (measured as Moran’s I) 
plotted against the geographic distance between range centroids. For this we used 
all ranges, including those with centroids off the mainland. 
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A complete insect from the Late Devonian period 


Romain Garrouste!, Gaél Cleément?, Patricia Nel', Michael S. Engel’, Philippe Grandcolas', Cyrille D’Haese!, Linda Lagebro*, 
Julien Denayer?, Pierre Gueriau~®, Patrick Lafaite’, Sébastien Olive*”®, Cyrille Prestianni® & André Nel! 


After terrestrialization, the diversification of arthropods and 
vertebrates is thought to have occurred in two distinct phases’, 
the first between the Silurian and the Frasnian stages (Late 
Devonian period) (425-385 million years (Myr) ago), and the sec- 
ond characterized by the emergence of numerous new major taxa, 
during the Late Carboniferous period (after 345 Myr ago). These 
two diversification periods bracket the depauperate vertebrate 
Romer’s gap (360-345 Myr ago) and arthropod gap (385-325 Myr 
ago)’, which could be due to preservational artefact”*. Although a 
recent molecular dating has given an age of 390 Myr for the 
Holometabola*, the record of hexapods during the Early-Middle 
Devonian (411.5-391 Myr ago, Pragian to Givetian stages) is excep- 
tionally sparse and based on fragmentary remains, which hinders 
the timing of this diversification. Indeed, although Devonian 
Archaeognatha are problematic”’, the Pragian of Scotland has given 
some Collembola and the incomplete insect Rhyniognatha, with its 
diagnostic dicondylic, metapterygotan mandibles*’. The oldest, 
definitively winged insects are from the Serpukhovian stage (latest 
Early Carboniferous period)*. Here we report the first complete Late 
Devonian insect, which was probably a terrestrial species. Its 
‘orthopteroid’ mandibles are of an omnivorous type, clearly not 
modified for a solely carnivorous diet. This discovery narrows the 
45-Myr gap in the fossil record of Hexapoda, and demonstrates 
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further a first Devonian phase of diversification for the Hexapoda, 
as in vertebrates, and suggests that the Pterygota diversified before 
and during Romer’s gap. 

The insect was recovered from the Famennian Strud locality, 
Namur province, Belgium (50° 26’ 43.32'’ N, 5° 03' 24.86"’ E), known 
widely for its early tetrapods®®. It was found in a freshwater asso- 
ciation including plants'', numerous Crustacea (Branchiopoda and 
Malacostraca) and Chelicerata (Eurypterida) (Fig. 1, Supplementary 
Figs 1 and 2 and Supplementary Information 1). The specimen is 
comprised of a two-dimensional compression with median abdominal 
structures elevated owing to natural filling of the gut, and excludes the 
possibility that the material represents an exuvia. A ‘shadow of organic 
origin surrounds the body. The connections of the appendages with 
the body are partly destroyed because of a well-known compression 
and decay process’’, rendering difficult the study of some parts. The 
appendages were not displaced, except for one or two legs. 


Class Insecta 
Clade Dicondylia 
Strudiella devonica gen. et sp. nov. 


Etymology. Strudiella is a diminutive form based on the type locality 
Strud (the name is feminine); devonica is after the Devonian age of the 
fossil. 
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Figure 1 | Partial Strud stratigraphy (fossiliferous levels), Bois des Mouches Formation, Upper Famennian. 
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Figure 2 | General habitus of Strudiella devonica gen. et sp. nov. a, Photograph of the part. b, Reconstruction of general habitus. Scale bar, 1 mm. White arrows 


indicate legs visible on part. abd, abdomen; ant, antenna; h, head; md, mandible. 


Holotype. IRSNB a12818a-b, part and counterpart, by present 
designation. 

Diagnosis and description. Apterous, body elongate and narrow, 
8.0mm long and 1.7mm wide (Fig. 2a, b); six thoracic, uniramous 
legs with tibiae and femora long and thin; antenna uniramous (Figs 2a 
and 3a), long, with scape and pedicel distinctly broader than remaining 
antennomeres, about 10 short flagellomeres; mandible triangular (of 
metapterygotan form) with a continuous series of sharp but small 
irregular molar and incisor cusps (Fig. 3b, c and Supplementary Fig. 3); 
large dark eyes in posterior part of head; head rather small; thorax 
broad, well separated from head and abdomen, with a rounded struc- 
ture covering posterior half of head, corresponding to an expanded 


Figure 3 | Strudiella devonica gen. et sp. nov., counterpart details. 
a, Photograph of anterior part of head. b, Photograph of left mandible. 
c, Reconstruction of left mandible. Scale bars, 0.45 mm (a), 0.2mm (b), 


pronotum; abdomen divided into 10 segments, without lateral leglets, 
gills or other appendicular structures. 

The presence of a thorax separated from the head and abdomen, 
bearing three pairs of legs, is one of the hallmark apomorphies of the 
Hexapoda’. Further features of significance are long, uniramous legs, 
one pair of long, uniramous antennae, large eyes, abdomen divided 
into 10 segments and absence of abdominal leglets. Although these 
can be found individually in different crustacean clades'*", their com- 
bination with the aforementioned apomorphy of hexapods is distinctly 
insectan and precludes an interpretation of this fossil as a juvenile 
Notostraca, which are abundant in the layer (see Supplementary 
Information). The scape and pedicel being distinctly broader than 


0.1mm (c). 3rd S, third antennal segment; ey, eye; md, mandible; p, maxillary 
palp; pe, pedicel; Sc, scape. 
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Figure 4 | Phylogeny of basal hexapod clades. Hexapoda gap in dark grey, Romer’s gap in pale grey, (411.5 Myr ago); Rhyniella praecursor (1) Hirst and Maulik, 
1926; Rhyniognatha hirsti (2) Tillyard, 1928; undescribed fossil (3) from Gilboa (391 Myr ago). t, no extant lineages. 


the distal segments constitute a synapomorphy of the Insecta within 
the Hexapoda, related to the presence of muscles and Johnston 
organ'*"'®. The first two antennal segments being larger than the others 
is a character also present in Chilopoda, especially lithobiomorphs; 
however, the combination of characters does not match attribution to 
Myriapoda. Among the Hexapoda, the long femora and _ tibia 
correspond to those found among pterygotes, rather than those of the 
wingless Archaeognatha or Zygentoma’’. Within the Pterygota, the 
mandibles are of an ‘orthopteroid’ type, short and triangular®'*”, and 
currently considered an apomorphy of the winged Metapterygota 
(Pterygota, excluding Palaeodictyoptera and Ephemeroptera)**"*”. 
Unfortunately, wings are not observable on the present specimen. 
The absence of wings plus the minute size suggests that the individual 
was a nymph, but a conclusive determination about the absence of 
genital structures cannot be established owing to the poor preser- 
vation of the abdominal apex. 

The expanded pronotum is similar to those of many insect lineages 
(for example, Dictyoptera and Grylloblattodea), and its presence in a 
Middle-Palaeozoic-era insect is not surprising. The lack of lateral 
appendages (leglets and gills) on the abdominal segments is clearly 
derived relative to the condition in the wingless Zygentoma, and 
differs from nymphs (but not adults) of several pterygote orders, 
although whether these are individually derived in those immatures 
or are plesiomorphic is debatable”. Strudiella shares, with the 
most basal hexapod lineages, antennae with uniformly similar 
flagellomeres*'**. 

The long legs without adaptations for swimming, plus the apparent 
absence of abdominal gills and rarity of Strudiella within the arthropod 
fauna of the Strud locality, support the hypothesis that it was a terrestrial 
animal. It must have been omnivorous or phytophagous but certainly 
not carnivorous given the weak development of the incisor compared 
with the molar cusps, and the sharp irregular cusps corresponding to 
the ‘omnivorous type’ of Gangwere”. Strudiella and Rhygniognatha 
certainly had different feeding habits as the mandibles of 
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Rhygniognatha had sharper cusps, corresponding to mycetophagy 
and/or saprophagy but not to carnivory”®. 

Ward et al.’ supposed that the Early Carboniferous low oxygen 
interval corresponding to the Romer’s gap (360-345 Myr ago) con- 
strained the timing of diversification of the Hexapoda as well as for other 
arthropods. The high morphological disparity of the pterygote insects in 
the Serpukhovian stage (320Myr ago), now well documented’, 
suggests that the diversification of this clade occurred very rapidly after 
Romer’s gap, or more likely during it, in accordance with the recent 
discoveries of early Carboniferous terrestrial arthropods. Strudiella 
demonstrates further that an early diversification of the dicondylic 
insects occurred before Romer’s gap* (Fig. 4), well in accordance with 
the presence of a diversified abundant terrestrial vegetation, including 
forests, since the mid-Devonian®'’*”. 


METHODS SUMMARY 


The material is housed at the Royal Belgian Institute of Natural Sciences (Brussels, 
Belgium). The fossils were prepared using a sharp knife. Photographs were taken 
using an Olympus SZX9 stereomicroscope system with an Olympus E3 digital 
camera, and the fossil was moistened with 70% alcohol. Illustrations were prepared 
using a camera lucida on a binocula Olympus SZX9. 
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Revealing structure and assembly cues for 
Arabidopsis root-inhabiting bacterial microbiota 
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Philipp Rauf}, Bruno Huettel’, Richard Reinhardt’, Elmon Schmelzer’, J oerg Peplies*, Frank Oliver Gloeckner**, Rudolf Amann’, 


Thilo Eickhorst® & Paul Schulze-Lefert! 


The plant root defines the interface between a multicellular eukaryote 
and soil, one of the richest microbial ecosystems on Earth’. Notably, 
soil bacteria are able to multiply inside roots as benign endophytes 
and modulate plant growth and development’, with implications 
ranging from enhanced crop productivity’ to phytoremediation‘. 
Endophytic colonization represents an apparent paradox of plant 
innate immunity because plant cells can detect an array of microbe- 
associated molecular patterns (also known as MAMPs) to initiate 
immune responses to terminate microbial multiplication’. Several 
studies attempted to describe the structure of bacterial root endo- 
phytes®; however, different sampling protocols and low-resolution 
profiling methods make it difficult to infer general principles. Here 
we describe methodology to characterize and compare soil- and root- 
inhabiting bacterial communities, which reveals not only a function 
for metabolically active plant cells but also for inert cell-wall features 
in the selection of soil bacteria for host colonization. We show that the 
roots of Arabidopsis thaliana, grown in different natural soils under 
controlled environmental conditions, are preferentially colonized by 
Proteobacteria, Bacteroidetes and Actinobacteria, and each bacterial 
phylum is represented by a dominating class or family. Soil type 
defines the composition of root-inhabiting bacterial communities 
and host genotype determines their ribotype profiles to a limited 
extent. The identification of soil-type-specific members within the 
root-inhabiting assemblies supports our conclusion that these rep- 
resent soil-derived root endophytes. Surprisingly, plant cell-wall 
features of other tested plant species seem to provide a sufficient 
cue for the assembly of approximately 40% of the Arabidopsis 
bacterial root-inhabiting microbiota, with a bias for Betaproteobacteria. 
Thus, this root sub-community may not be Arabidopsis-specific but 
saprophytic bacteria that would naturally be found on any plant 
root or plant debris in the tested soils. By contrast, colonization of 
Arabidopsis roots by members of the Actinobacteria depends on 
other cues from metabolically active host cells. 

We have grown Arabidopsis ecotypes Shakdara (Sha) and Landsberg 
erecta (Ler) in natural soils of contrasting geochemistry, designated 
Cologne (clay- and silt-rich) or Golm (sand- rich) soil, under 
controlled environmental conditions and at a defined planting density 
(Supplementary Fig. 1 and Supplementary Table 1). At early flowering 
stage we collected samples from three compartments: ‘unplanted soil’ 
(number of replicates: Cologne nc = 13, Golm ng = 12), ‘rhizosphere’ 
(nc = 15, ng = 12) and ‘root’ (nc = 18, ng = 14). The ‘rhizosphere 
compartment’ defines the soil particles firmly attached to roots 
collected by centrifugation of root washings (Supplementary Movie 
1). The ‘root compartment’ is defined as root tissue depleted of soil 
particles and epiphytic bacteria by sequential washing and sonication 
treatments and is therefore enriched for root-inhabiting bacteria 


(Supplementary Fig. 2). We used pyrosequencing of an approximately 
400 base pairs PCR amplicon of the bacterial 16S ribosomal RNA gene 
and analysed the variable gene segments V5-V6. 

To examine the taxonomic structure of the bacterial communities 
we performed a supervised taxonomy classification of all high quality 
reads using the SILVA’ database. This classification identified a total of 
43 bacterial phyla and divisions and revealed an anomalous 
Chloroflexi abundance in all samples (Fig. 1a). PCR-independent 
catalysed reporter deposition-fluorescence in situ hybridization 
(CARD-FISH) analysis on soil samples (Supplementary Fig. 3) and 
comparative PCR primer analysis indicated this is due to a PCR primer 
bias (Supplementary Information and Supplementary Fig. 4). After 
removal of reads assigned to Chloroflexi we identified Proteobacteria, 
Actinobacteria and Bacteroidetes as dominating phyla in root bacterial 
communities and significantly enriched compared to soil and 
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Figure 1 | Taxa at high taxonomic ranks define building blocks of root- 
associated bacterial communities. a, Average relative abundance 

(%o * s.e.m.) of the phylum Chloroflexi detected in the indicated 
compartments. b, Average relative abundance (%o + s.e.m.) of the dominant 
phyla (> 5 %o) detected in root compartments of the indicated soil types. 

c, Average relative abundance (%o + s.e.m.) of families belonging to the three 
dominant phyla in the root compartment. In b and c average relative 
abundances are calculated after removal of reads assigned to Chloroflexi. 
Asterisks indicate significant enrichment (Benjamini-Hochberg false- 
discovery-rate (FDR) adjusted P value < 0.05) in the root compartment 
compared to soil and rhizosphere compartments. 
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rhizosphere (Fig. 1b). Within the root-inhabiting Proteobacteria we 
noted an over-representation of Betaproteobacteria families com- 
pared to Alphaproteobacteria and Gammaproteobacteria (Fig. 1c). 
Likewise, in each of the other three root-inhabiting phyla we noted 
a dominating family: Flavobacteriaceae for the phylum Bacteroidetes 
and Streptomycetaceae for the phylum Actinobacteria (Fig. 1c). The 
Streptomycetaceae and the families belonging to the Betaproteobacteria 
are significantly enriched in the root compared to unplanted soil and 
rhizosphere compartments in both soils tested (Fig. 1c and Supplemen- 
tary Data 1). Together this indicates that building blocks of Arabidopsis 
root-inhabiting bacterial communities are detected at the family level. 

We noted that unplanted soil and rhizosphere contain a dispropor- 
tionate number of reads that cannot be unambiguously classified at the 
order level using two different databases (Supplementary Fig. 5 and 
Supplementary Information), indicative of an insufficient database 
representation of the biodiversity of soil-borne bacteria*. To overcome 
this limitation, we clustered the 16S rRNA gene sequences of all 
compartments and defined operational taxonomic units (OTUs) of 
bacteria at = 97% sequence identity. Bacterial diversity measured as 
OTU richness was estimated in the three compartments by rarefaction 
analysis and revealed the greatest number of OTUs in unplanted soil 
(~2,000), followed by a reduced richness in rhizosphere and root 
compartments (each ~ 1,000) (Supplementary Fig. 6). Technical repli- 
cates of the same DNA sample defined a minimum threshold of 5%o 
relative abundance for reproducible quantification of individual OTUs 
(Supplementary Fig. 7). For subsequent analysis we also depleted 
OTUs assigned to the phylum Chloroflexi and consequently nine 
samples with less than 1,000 reads were excluded (Supplementary 
Information). 

To compare the composition of the identified community members 
we generated a hierarchical cluster based on Bray—Curtis distance 
(Supplementary Fig. 8). Consistent with studies using other plant species, 
this showed a striking effect of the host on associated bacterial micro- 
biota”. The root compartment renders the root-inhabiting microbiota 
significantly dissimilar from the communities retrieved from rhizo- 
sphere and unplanted soil compartments (Supplementary Fig. 8 and 
Supplementary Table 2). Closer examination also showed a marked 
soil-type-dependent effect on both unplanted soil and rhizosphere 
communities, indicating a different natural bacterial start inoculum 
in the tested soils. A differentiation between unplanted soil and the 
rhizosphere microbiota across seasonal soil batches was not evident 
from the cluster dendrogram (Supplementary Fig. 8). However, a 
distinct bacterial community in the rhizosphere compartment is 
detectable when looking at samples obtained from the same soil batch 
(Supplementary Fig. 9). This differentiation is reflected by a significant 
rhizosphere effect according to PERMANOVA analysis of the Bray- 
Curtis distance matrix (Supplementary Table 2), indicating that a 
rhizosphere effect is obscured by soil batch-to-batch variation. 

To identify bacteria responsible for the observed community differ- 
entiation (Supplementary Fig. 8), we used a linear model analysis to 
determine indicator OTUs for each tested compartment (Supplemen- 
tary Data 1). Consistent with the Bray-Curtis cluster analysis, we 
found a subset of OTUs significantly enriched in roots, whereas 
unplanted soil and rhizosphere compartments share a large propor- 
tion of OTUs (Fig. 2a). 

To obtain insights in potential plant-derived assembly cues for the 
root-inhabiting microbiota, we incubated untreated wooden splinters 
representing metabolically inactive lignocellulosic matrices for bacterial 
colonization (Supplementary Figs 1 and 10) as additional compartment 
in both tested soils. We determined the associated bacterial communities 
of the softwood birch (Betula, nc = 8; ng = 11) and the hardwood 
beech (Fagus, nc = 4; ng = 4). Remarkably, approximately 40% of 
the Arabidopsis root-enriched OTUs in Cologne soil are equally or 
even more abundant in these wooden compartments (Fig. 2b, c and 
Supplementary Data 1). The identification of this shared sub- 
community (designated IOTUs for lignocellulosic matrix-associated 
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Figure 2 | Arabidopsis assembles a distinctive root-inhabiting bacterial 
microbiota. a-g, Compartment specificity and relative abundance of OTUs 
(> 5 %o) determined in samples from Cologne soil (a-d) and in comparison to 
Golm soil (e-g). a, Ternary plot of all OTUs. Each circle represents one OTU. 
The size of each circle represents its relative abundance (weighted average). The 
position of each circle is determined by the contribution of the indicated 
compartments to the total relative abundance. The dotted grid and numbers 
inside the plot indicate 20% increments of contribution from each 
compartment (see Supplementary Methods). Dark blue circles mark OTUs 
significantly enriched in the root compartment (FDR < 0.05). b, Ternary plot 
similar to a including the wood compartment. OTUs significantly enriched in 
root (dark blue, rOTUs), wood-enriched community members (orange, 
wOTUs) and OTUs shared by root and wood compartments (light blue, 
1OTUs) (FDR < 0.05). ¢, Heat map of the relative abundance of root- and 
wood-enriched OTUs. Vertical columns represent samples, horizontal rows 
depict OTUs. Clustering of samples (top) is based on OTUs co-occurrence. 
Colour code on the left indicates OTU compartment specificity as defined in 
b. d, Taxonomic composition of rOTUs, OTUs and wOTUs subcommunities. 
The size of each segment in the chart is proportional to the cumulative relative 
abundance of OTUs assigned to the indicated taxa. e, Numbers of OTUs and 
rOTUs in the indicated soils (FDR < 0.05). f, relative abundance of OTU 
Actinocorallia sp. in the indicated compartments (mean + s.e.m.). Asterisks 
indicate significant enrichment in the root compartment over all other 
indicated compartments (FDR < 0.05). g, Differential relative abundance of 
OTU Actinocorallia sp. in the root compartment of the indicated Arabidopsis 
ecotypes (mean + s.e.m.). Asterisks indicate significant differences (FDR < 0.05). 


OTUs) indicates that plant cell-wall features serve as sufficient 
colonization cue. A second sub-community, specifically enriched in 
roots (designated rOTUs), seems to depend on other or additional cues 
from metabolically active host cells (Fig. 2b, c and Supplementary 
Data 1). A third sub-community is specifically enriched in the wooden 
matrices (designated wOTUs; Fig. 2b, c and Supplementary Data 1). 
Under-representation of wOTUs in roots might reflect a selective 
inhibitory plant activity against colonization by these bacteria. The 
three sub-communities identified through a comparison of root and 
wooden compartments were also seen in analogous experiments in 
Golm soil (Supplementary Fig. 11 and Supplementary Data 1). 
Distinctive root- and wood-derived OTU profiles and subtle differ- 
ences between soil- and rhizosphere-derived profiles were further 
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supported by the classifiability of the compartments (Supplementary 
Fig. 12 and Supplementary Table 3), non-parametric tests (Supplemen- 
tary Fig. 13) and were robust against the primer bias (Supplementary 
Fig. 14). Notably, a different taxonomic structure defines the three 
sub-communities. Whereas Proteobacteria represent the vast majority 
of wOTUs and lOTUs community members, Actinobacteria were 
largely underrepresented in these communities compared to the 
rOTUs community (Fig. 2d and Supplementary Fig. 11). Hence, 
Actinobacteria are specifically enriched in roots. To examine a potential 
role of soil type on the root-inhabiting bacterial assemblage, we com- 
pared the bacterial profiles obtained in Cologne and Golm soils (Fig. 2e). 

Within the rOTUs community, 9 OTUs were enriched in roots of 
both tested soil types, whereas 21 and 13 OTUs were significantly 
enriched in roots derived from Cologne or Golm soils, respectively 
(Fig. 2e). Notably, closer examination validates for 15 of the root- 
inhabiting OTUs (OTUs plus rOTUs) a significant differential enrich- 
ment between the soil types (Supplementary Fig. 15 and Supplementary 
Data 1). This identifies soil type as an important environmental variable 
influencing the quantitative and qualitative composition of the 
Arabidopsis root-inhabiting microbiota. This also points to soil-type- 
dependent root colonization processes and/or reflects different bacterial 
start inocula. 

A root-enriched OTU in Cologne and Golm soil, classified as 
Actinocorallia sp., belonging to the Actinobacteria phylum (Fig. 2f), 
showed a differential accumulation between the two Arabidopsis 
ecotypes tested, Sha and Ler (Fig. 2g). An approximately tenfold dif- 
ference in relative abundance was observed in independent experi- 
ments using Cologne soil collected during fall or spring. A similar 
differential accumulation trend was found between these ecotypes 
grown in Golm soil although statistically not significant. Using a 
similar experimental platform’, a small number of OTUs (12) of 
the root-inhabiting bacterial assemblage whose accumulation is quan- 
titatively influenced by the host genotype in specific soil environments 
was identified among eight Arabidopsis ecotypes tested’*. End-point 
PCR analysis using PCR primers designed on the basis of OTU 
Actinocorallia sp.-representative sequence that are specific for 
Actinomycetales (Supplementary Table 4) independently validated 
their presence in soil-grown roots of Sha and Ler (Supplementary 
Fig. 16). PCR amplicons were also detectable in unplanted soil samples 
but not in surface-sterilized and crushed seeds of either accession 
(Supplementary Fig. 16), indicating that the root-inhabiting 
Actinocorallia are recruited from soil. These findings indicate that 
natural genetic variation in A. thaliana exerts a quantitative control 
on root-inhabiting bacteria and that its phenotypic variation is influ- 
enced by the environmental component soil type. 

The Arabidopsis rhizoplane, the interface between host tissue and 
rhizosphere soil (Supplementary Fig. 2), was largely eliminated by 
sonication during sample preparation for the pyrosequencing-based 
16S rRNA gene survey. Therefore, we used scanning electron micro- 
scopy (SEM) and adopted a modified fluorescence in situ hybridiza- 
tion (FISH) protocol (CARD-FISH; Supplementary Information) to 
characterize and visualize bacteria attached to the rhizoplane. SEM 
analysis revealed morphologically diverse bacteria-like structures 
(Fig. 3 a~e) and CARD-FISH identified a high density of bacteria using 
the probe EUB338 that detects the majority of known Eubacteria 
corresponding to their rRNA content, and therefore represents a proxy 
of their metabolic state (Fig. 3f and Supplementary Table 5). 
Particulate CARD-FISH signals were not found using the reverse 
complement probe or upon hybridization with roots of axenically 
grown plants (Fig. 3g and Supplementary Fig. 17). All three taxa that 
dominate the root-inhabiting community based on pyrosequencing of 
PCR-amplified DNA were also detected on the rhizoplane by CARD- 
FISH (Betaproteobacteria, Bacteroidetes and Actinobacteria; Fig. 3h-j). 
Each CARD-FISH probe detected a distinct colonization pattern for the 
respective phylum. The same phylum-specific CARD-FISH probes 
detected very few signals in unplanted soil and rhizosphere samples 
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Figure 3 | Arabidopsis root-inhabiting bacteria are detectable on the 
rhizoplane. a-—e, Scanning electron micrographs of bacteria-like structures. 
Bars, 1 um. f-j, CARD-FISH detection of bacteria (green, AlexaFluor488) on the 
root surface (red, root autofluorescence) by confocal laser scanning microscopy. 
f, Most Eubacteria detected with probe EUB338. g, negative control with reverse 
complementary probe of EUB338 (NONEUB). h, Betaproteobacteria detected 
with probe BET 42a. i, Bacteroidetes detected with probe CF319a. 

j, Actinobacteria detected with probe HGC69a. Bars, 20 tim. 


compared to the EUB338 probe (Supplementary Fig. 3). Together with 
undetectable 16S rRNA gene amplicons from surface-sterilized and 
crushed seeds or roots from axenically grown plants (Supplementary 
Fig. 16), this indicates that these rhizoplane-attached bacteria are 
derived from the start inoculum of soil bacteria. 

Are the results from our controlled environment experiments rel- 
evant under natural conditions? We collected roots of A. thaliana 
plants naturally grown at a site close to Cologne and determined 
bacterial profiles in the root, rhizosphere and corresponding soil com- 
partments (Supplementary Information and Supplementary Data 1). 
These samples differ in many aspects from the controlled environment 
samples including vegetation period, soil geochemistry (Supplemen- 
tary Table 1), climatic conditions, inter-species competition, un- 
controlled biotic/abiotic stresses and an unknown A. thaliana 
genotype different from Ler and Sha accessions (Supplementary Fig. 18). 
Despite the many differences to the controlled environment experi- 
ments, the distinctiveness of the root-inhabiting microbiota from those 
of rhizosphere and unplanted soil compartments is retained in 
naturally grown Arabidopsis (Fig. 4a). Bray-Curtis analysis of the 
combined data from greenhouse and naturally grown plants confirms 
that the compartments are the major determinants of community 
structure. Notably, a large proportion of OTUs retrieved under 
greenhouse conditions were also detected in each of the three tested 
compartments of naturally grown Arabidopsis plants (Supplementary 
Fig. 19). This includes OTUs belonging to the root-enriched rOTUs 
and lOTUs sub-communities of the greenhouse experiments (compare 
Fig. 2e and Fig. 4b). Because these OTUs were preferentially detected in 
the root compartment of the natural site (Fig. 4c), the selectivity of 
Arabidopsis roots for the rOTUs seems robust in a natural and con- 
trolled environment. For example, the enrichment gradient found 
between the three tested compartments for the root-specific OTU 
Actinocorallia sp. in the natural specimens is similar to the one under 
controlled environmental conditions (Fig. 4d, compare to Fig. 2f). 

Reproducible measurements of bacterial microbiota under con- 
trolled environmental conditions revealed a function for metabolically 
active plant cells and cell-wall features in the selection of soil bacteria 
for host colonization. Root and wooden compartments are preferen- 
tially colonized by Betaproteobacteria and Bacteroidetes (Fig. 2d and 
Supplementary Fig. 11). Members of these two phyla are characterized 
as copiotrophic soil bacteria, that is, they compete successfully only 
when organic resources are abundant’*. Thus, colonization of 
Arabidopsis roots by the 1OTUs sub-community could reflect their 
ability to proliferate in the presence of polysaccharide polymers and 
might contribute to the decomposition of organic matter after plant 
death. Our study predicts that the OTUs sub-community is not 
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Figure 4 | Root selectivity of soil-borne bacteria is retained in naturally 
grown Arabidopsis plants. a, Bray-Curtis dissimilarity of samples collected 
from a natural (black) and controlled greenhouse environment (coloured). 
OTU counts were rarefied to 1,000 counts per sample and OTUs with relative 
abundance > 5% were included in the analysis. b, Percentage of shared OTUs 
between root-associated sub-communities found at controlled environmental 
conditions in Cologne or Golm soils and naturally grown Arabidopsis plants 
(comparison based on numbers shown in Fig. 2e). c, Ternary plot of the relative 
abundance and proportional contribution of OTUs (> 5 %o) in the indicated 
compartments displaying OTUs (light blue) and rOTUs (dark blue) identified 
in greenhouse experiments. d, Relative abundance of OTU Actinocorallia sp. 
(mean = s.e.m.; n.d., not detected) in the indicated compartments. Asterisks 
indicate a significant difference (FDR < 0.05) between root and rhizosphere. 


Arabidopsis-specific but may be saprophytic bacteria that would 
naturally be found on any plant root or plant debris in the tested soils. 
In contrast, the selective enrichment of Actinobacteria in the rOTUs 
community (Fig. 2d, Supplementary Fig. 11 and Supplementary Data 
1) indicates that cues from metabolically active host cells are needed 
for the assembly of the rOTUs sub-community. Together with the 
observation that a subset of soil bacteria is enriched in the wooden 
compartment compared to roots, our results point to an active host 
process mediating attractant and repellent activities. Members of the 
rOTUs sub-community may provide probiotic functions for the plant. 
For example, Actinobacteria are known to produce a vast diversity of 
antimicrobial compounds". 

The composition of the putative root endophyte microbiota is influ- 
enced by the soil type (Fig. 2e and Supplementary Fig. 15), which could 
reflect different natural start inocula in the three tested soils and sup- 
ports our conclusion that at least a subset of these communities repre- 
sents Arabidopsis root endophytes originating from the microbiota 
reservoir present in natural soil. The host genotype was found to have 
a limited effect on the root endophyte profile (Fig. 2g), which is remin- 
iscent of the mouse gut microbiota for which the host genotype quan- 
titatively contributes to its structure’’. Strikingly similar findings are 
reported in ref. 12 using two additional soil types, eight Arabidopsis 
accessions and a similar fractionation protocol, but different PCR 
primers and different computational pipelines, thereby supporting 
the generality of our conclusions. Thus, our work provides a founda- 
tion for future molecular studies elucidating how Arabidopsis roots 
control and tolerate colonization by a specific endophyte community 
despite an elaborate innate immune system, including receptors for 
conserved bacterial structures such as flagellin’®. 


METHODS SUMMARY 


A detailed description of the natural soils and all methods used in this study can be 
found in the Supplementary Information. Arabidopsis thaliana ecotypes Shakdara 
and Landsberg erecta were grown in pots filled with natural Cologne or Golm soil 
under long-day conditions (16-h photoperiod) at a defined planting density and 
roots were harvested at early flowering stage. Wooden Fagus and Betula splinters 
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were inserted into soil to a depth of approximately 4cm and as well as unplanted 
soil pots subjected to the same conditions as pots with living plants. The upper- 
most 3 cm of roots and soil-incubated wood were harvested and adhering soil was 
washed off and defined as ‘rhizosphere compartment’. After washing, roots and 
wooden splinters were sonicated to remove the bacterial surface biofilm and to 
enrich for endophytic bacteria (‘root’/ wood’ compartments). DNA extraction of 
all compartments was performed using the MP Bio Fast DNA for Soil Kit. 
Barcoded bacterial 16S rRNA gene PCR amplicons were generated using a modi- 
fied version of previously described primers’”'* in combination with a touch- 
down PCR programme (Supplementary Table 6) to minimize host rRNA gene 
amplification. Amplicons were gel-purified (Qiagen), pooled and sequenced on a 
454 Titanium platform (Roche). We performed a classification of the 454 reads 
using the SILVA’ database. For OTU-based analysis we used PyroTagger’’ to 
screen for high-quality sequences and clusters of 97% sequence identity. 
Statistical analyses were performed using a series of packages and scripts 
developed in R. Significant differences of OTU- or SILVA-based taxonomy counts 
between samples from two compartments or in interaction terms were obtained 
using moderated t-tests on log-transformed relative abundance and corrected for 
multiple hypothesis testing. CARD-FISH experiments were performed as previ- 
ously described with minor modifications”’*’. SEM micrographs were recorded as 
described previously”. 
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Land plants associate with a root microbiota distinct from the com- 
plex microbial community present in surrounding soil. The micro- 
biota colonizing the rhizosphere (immediately surrounding the root) 
and the endophytic compartment (within the root) contribute to 
plant growth, productivity, carbon sequestration and phytoremedia- 
tion’*. Colonization of the root occurs despite a sophisticated 
plant immune system*”*, suggesting finely tuned discrimination of 
mutualists and commensals from pathogens. Genetic principles 
governing the derivation of host-specific endophyte communities 
from soil communities are poorly understood. Here we report the 
pyrosequencing of the bacterial 16S ribosomal RNA gene of more 
than 600 Arabidopsis thaliana plants to test the hypotheses that the 
root rhizosphere and endophytic compartment microbiota of plants 
grown under controlled conditions in natural soils are sufficiently 
dependent on the host to remain consistent across different soil 
types and developmental stages, and sufficiently dependent on host 
genotype to vary between inbred Arabidopsis accessions. We 
describe different bacterial communities in two geochemically dis- 
tinct bulk soils and in rhizosphere and endophytic compartments 
prepared from roots grown in these soils. The communities in each 
compartment are strongly influenced by soil type. Endophytic com- 
partments from both soils feature overlapping, low-complexity com- 
munities that are markedly enriched in Actinobacteria and specific 
families from other phyla, notably Proteobacteria. Some bacteria 
vary quantitatively between plants of different developmental stage 
and genotype. Our rigorous definition of an endophytic compart- 
ment microbiome should facilitate controlled dissection of plant- 
microbe interactions derived from complex soil communities. 

Roots influence the rhizosphere by altering soil pH, soil structure, 
oxygen availability, antimicrobial concentration, and quorum-sensing 
mimicry, and by providing an energy source of dead root material 
and carbon-rich exudates’. The microbiota inhabiting this niche 
can both benefit and undermine plant health; shifting this balance is 
of agronomic interest. Mutualistic microbes may provide the plant with 
physiologically accessible nutrients and phytohormones that improve 
plant growth, may suppress phytopathogens or may help plants 
withstand heat, salt and drought*”. The rhizosphere community is a 
subset of soil microbes that are subsequently filtered via niche utiliza- 
tion attributes and interactions with the host to inhabit the endophytic 
compartment" (EC). Although a variety of microbes may enter and 
become transient endophytes, those consistently found inside roots are 
candidate symbionts or stealthy pathogens'*"'. Notably, Arabidopsis 
and other Brassicaceae are not well colonized by arbuscular mycorrhizal 
fungi, implying that other microorganisms may fill this niche. 


Microbial community structure differs across plant species'*”’, and 


there are reports of host-genotype-dependent differences in patterns of 
microbial associations'*’*. However, the divergent methods used in 
those studies relied on small sample sizes and low-resolution phylotyp- 
ing techniques potentially confounded by off-target sequences and 
chimaeric amplicons. We developed a robust experimental system to 
sample repeatedly the root microbiome using high-throughput 
sequencing. Our results confirm many of the general conclusions from 
earlier studies and, because of controlled experimental design and the 
power of deep sequencing, provide a key step towards the definition of 
this microbiome’s functional capacity and the host genes that poten- 
tially contribute to microbial association phenotypes. Such plant genes 
would constitute major agronomic targets. 

We used 454 pyrosequencing to sequence 16S ribosomal RNA 
(rRNA) gene amplicons for DNA prepared from eight diverse, inbred 
A. thaliana accessions. Plants were grown from surface-sterile seeds in 
climate-controlled conditions in two diverse soils, respectively termed 
Mason Farm and Clayton (Supplementary Table 1; detailed in 
Supplementary Information). For each soil, we assayed multiple indi- 
viduals from each A. thaliana accession grown from sterile seeds in 
both soils across independent full-factorial biological replicates, in 
which all genotypes and bulk soils (pots without a plant) for a given 
soil type were grown in parallel (Supplementary Table 2). We isolated 
separate rhizosphere and EC fractions from individual plant root 
systems (Supplementary Fig. 1 and Supplementary Table 2). We 
established 1114F and 1392R as our primer pair (Supplementary 
Information and Supplementary Fig. 2). Using an otupipe-based 
pipeline (http://drive5.com/otupipe/), we grouped sequences into 
97%-identical operational taxonomic units (OTUs), reduced noise 
and removed chimaeras. We determined technical reproducibility 
thresholds to conclude that OTUs defined by =25 reads in =5 samples 
(hereafter 25 X 5) are individually ‘measurable OTUs’*”” (Supplemen- 
tary Figs 2 and 10). All data reported here are from one run of our 
otupipe-based pipeline (Supplementary Fig. 3 and Supplementary 
Database 1). 

Excluding additional control samples, we ribotyped 1,248 samples 
comprising 111 bulk soil, 613 rhizosphere and 524 EC samples, 
generating 9,787,070 high-quality reads (Supplementary Figs 3 and 
4a-c). After removing plant-sequence-derived OTUs, we obtained a 
table of usable OTU read counts per sample containing 6,387,407 
reads distributed across 18,783 OTUs. We normalized this table of 
usable reads by rarefying to 1,000 reads per sample (Supplementary 
Database 2a) or, alternatively, by dividing the reads per OTU in a 
sample by the sum of usable reads in that sample, resulting in a table 
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of relative abundances (frequencies) (Supplementary Database 2b). 
Using the 25 X 5 threshold, we defined 778 measurable OTUs repre- 
senting 54% (3,463,632) of the usable reads (Supplementary Fig. 4c 
and Supplementary Table 3). The diversity of the 778 measurable 
OTUs in soil, rhizosphere and EC fractions showed expected relative 
trends when compared with the diversity by fraction of all usable 
OTUs (Supplementary Fig. 4d). We display the rarefaction-normalized 
data; parallel analyses of frequency-normalized data are provided in 
Supplementary Figures. 

We used principal coordinate analysis on pairwise, normalized, 
weighted UniFrac distances between all samples, considering all usable 
OTUs, to identify the main factors driving community composition 
(Fig. la and Supplementary Fig. 5a). The first principal coordinate 
(PCol1) revealed that the two bulk soils and their associated rhizospheres 
were differentiated from the respective EC fractions. Soil type was the 
main factor in the second component (PCo2). This pattern was recapi- 
tulated by hierarchical clustering of pairwise Bray—Curtis dissimilarities 
considering only measurable OTUs (Fig. 1b and Supplementary 
Fig. 5b). Samples harvested at different developmental stages clustered 
together, indicating that this variable does not have a major effect on 
overall community composition (Fig. 1 and Supplementary Fig. 5a, b; 
yng versus old, where yng refers to the time of appearance of an 
inflorescence meristem and old refers to fruiting plants with greater 
than 50% senescent leaves). Additional control samples from the 
reference genotype Col-0 harvested from four independent digs of 
Mason Farm soil underscored the reproducibility of these bacterial 
community profiles (Supplementary Fig. 6). Together, these data 
demonstrate that the interaction of diverse soil communities with 
plants determines the assembly of the rhizosphere, leading to 
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Figure 1 | Sample fraction and soil type drive the microbial composition of 
root-associated endophyte communities. a, Principal coordinate analysis of 
pairwise, normalized, weighted UniFrac distances between samples based on 
rarefaction to 1,000 reads in unthresholded, usable OTUs. CL, Clayton; MF, 
Mason Farm; R, rhizosphere; S, soil. b, Rarefied counts for the 25 X 5 
thresholded, measurable OTUs from each of 24 soil, stage or fraction groups 
were log,-transformed (Methods) to make 24 representative samples (branch 
labels), and pairwise Bray-Curtis similarity was used to cluster these 
representatives hierarchically (group-average linkage). 
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winnowed ECs, that the ECs from at least these two diverse soils are 
very different from the starting soil communities and that there is little 
difference in communities over host developmental time. 

We fitted a general linear mixed model (GLMM) to samples from 
each set of plant fractions (rhizosphere or EC), plus the bulk soil 
controls, to identify measurable OTUs whose abundances differ sig- 
nificantly between plant and bulk soil as a result of soil type, develop- 
mental stage, fraction and genotype (Supplementary Information and 
Supplementary Database 3). This approach allowed us to quantify the 
contribution from each variable to the community composition 
(Supplementary Table 4). Controlling for sequencing plate effects, 
plant fraction is the most important factor; its effect is strongest for 
the EC, consistent with our UniFrac and Bray—Curtis analyses. Soil 
type is less important, followed by experiment, developmental stage 
and, finally, genotype, which had a small but consistent effect. 

Hierarchical clustering of sample groups considering 256 OTUs 
identified by the GLMM to differentiate rhizosphere and EC from soil 
recapitulated the separation of EC from soil and rhizosphere (Fig. 2A 
and Supplementary Fig. 7a, left; compare with Fig. 1 and Supplemen- 
tary Fig. 5). Of these, 164 OTUs were enriched in EC samples (Fig. 2B, 
a; dark and light red bars), defining an A. thaliana ‘EC microbiome’. Of 
these 164, 97 were enriched in EC samples from both soil types 
(Fig. 2B, a; dark red bars), potentially representing a core EC micro- 
biome. By contrast, 67 of these 164 were enriched in EC to a greater 
extent in one soil than the other (Fig. 2B, a; light red bars; Fig. 2B, b)). 
Importantly, 32 OTUs were depleted in EC samples (Fig. 2B, a; 
blue bars). Some OTUs exhibited rhizosphere enrichment; these 
significantly overlapped the EC-enriched OTUs (P< 10 *°, one-sided 
hypergeometric test) and also sometimes had a soil-type component 
(Fig. 2B, c and d). Only a few rhizosphere-specific enrichments were 
not also enriched in the EC (Supplementary Table 3). Hence, the 
A. thaliana EC microbiome is enriched for both a shared set of 
OTUs commonly assembled across two replicates from two diverse 
soils, and a set of OTUs that are assembled from each soil. 

We assessed taxonomic distributions, first those of the 778 
measurable OTUs in soil, rhizosphere and EC fractions, and then 
those of the 256 EC-enriched and 32 EC-depleted OTUs (Fig. 2A, 
Supplementary Fig. 7a and Supplementary Table 3). Measurable 
OTUs were distributed across seven dominant phyla (Fig. 2C and 
Supplementary Fig. 7c) and contained ~50-70% of the usable reads 
in all fractions (Supplementary Fig. 4c). Phyla distribution of the EC- 
enriched OTUs reflected that of the entire EC. Conversely, the phyla 
distribution of the EC-depleted OTUs typically resembled that of the 
rhizosphere fraction (Fig. 2C). The lower Shannon diversity of the EC 
fraction is consistent with enrichment for a subset of dominant phyla. 
Specifically, the EC microbiome was dominated by Actinobacteria, 
Proteobacteria and Firmicutes, and was depleted of Acidobacteria, 
Gemmatimonadetes and Verrucomicrobia, when soil types were con- 
sidered either together or separately (Fig. 2C, Supplementary Figs 7c 
and 15 and Supplementary Table 5). Lower-order taxonomic analysis 
(Fig. 2D and Supplementary Fig. 7d) demonstrated that enrichment of 
a low-diversity Actinobacteria community in the EC was driven by a 
subset of families, predominantly Streptomycetaceae. 

Other phyla, such as Proteobacteria, were represented by both EC 
enrichments and EC depletions at the family level (Fig. 2E and 
Supplementary Fig. 7e). Strikingly, two alphaproteobacterial families, 
Rhizobiaceae and Methylobacteriaceae, and two gammaproteobacter- 
ial families, Pseudomonadaceae and Moraxellaceae, dominated the 
EC population in their respective classes (Fig. 2F, « and y, and 
Supplementary Fig. 7f, « and y). Equally striking was the EC 
redistribution of particular alpha- and gammaproteobacterial families 
that were common in soil and rhizosphere (Fig. 2F and Supplementary 
Fig. 7f). 

Specific OTUs, three from the family Streptomycetaceae and one 
from the order Sphingobacteriales, demonstrate the robustness of EC 
enrichments (Fig. 3a—d and Supplementary Fig. 1la-d). A few OTUs 
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Figure 2 | OTUs that differentiate the EC and 
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were either significantly enriched in rhizosphere but not in the EC 
(Fig. 3e, f, Supplementary Fig. 1le, f and Supplementary Table 3), or 
were associated with one of the two developmental stages (Fig. 3g, h, 
Supplementary Fig. 11g, h and Supplementary Table 3). Data in Fig. 2, 
Supplementary Fig. 7, Fig. 3, Supplementary Fig. 11 and Supplemen- 
tary Table 3 demonstrate that entire taxa at various levels are enriched 
in or depleted from the EC microbiome. Additionally, rhizosphere taxa 
capable of colonizing the root vicinity are nonetheless prevented from 
colonizing the EC. 

Several OTUs differentiated inbred A. thaliana accessions. 
Genotype-dependent enrichments and depletions were significant 
but weak (Supplementary Tables 5 and 3). To identify accession- 
dependent effects specific to a soil type or a developmental stage, we 
fitted a partial GLMM that modelled each genotype against bulk soil 
for each experiment or developmental stage group, and tested the 
model’s predictions with a non-parametric Kruskal-Wallis test 
corrected for multiple testing (Supplementary Information). We con- 
sidered only those significant accession-dependent effects that were 
present in the same direction in both biological replicates. We further 
required that these OTUs have a consistent prediction in the full 
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GLMM, which narrowed the field to 12 OTUs (or 27 with frequency- 
normalized data; Supplementary Table 3). In Fig. 3, we display relative 
abundances of two such OTUs, one for each soil type, both 
Actinobacteria (Fig. 3i, j and Supplementary Fig. 11i, j). That these 
enrichments were detected by the full GLMM (which accounts for plate 
effects due to 454 sequencing), and were sequenced over several plates 
(Supplementary Fig. 14) supports a true genotype effect. Thus, a small 
subset of the EC microbiome is likely to be quantitatively influenced by 
host-genotype-dependent fine-tuning in specific soil environments. 
This could allow compensatory contributions of the EC microbiome 
and host genome variation to overall metagenome function. 

Because the rhizoplane is stripped during preparation of EC 
fractions, we confirmed the presence of live bacteria on roots using 
catalysed reporter deposition and fluorescence in situ hybridization 
(CARD-FISH) to whole Col-0 root segments’*. Eubacteria were 
common on unsonicated roots (Fig. 4a). Actinobacteria detected with 
probe HGC69a were visible on the surface of roots grown in Mason 
Farm soil, and co-localized with a subset of the eubacterial signals 
using double CARD-FISH (Fig. 4b), suggesting that their enrichment 
in EC fractions either comes from, or egresses through, the rhizoplane. 
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Figure 3 | Dot plots of notable OTUs. Counts for 


a 140834 b 14402 c 17382 d 2324 
Family Family Family Order each OTU (number at top keyed to Supplementary 
500| Streptomycetaceae | Streptomycetaceae Streptomycetaceae Sphingobacteriales Table 3) from the rarefied table were log,- 
8 transformed and the counts for each sample 
407). % a : plotted as an individual symbol. The y axis is 
.8 ‘ 3 ° labelled with the actual (untransformed) counts. 
” og? % me wc ‘ 
S 93 28 Se % ° f = a-h, Each position on the x axis is labelled with a 
8 ee: & tees , ce symbol to represent the sample group, and samples 
7 ae a as Pos ‘y 8S : : 
6s 8 tse Sry ek we ere. Se from that group are plotted in the column directly 
er cae et owes) <a  |eeemeera aie! eisaewaat. iG a above. Biological replicates in the same column 
1 pore cme co rere beeen IID +0 00 serene mn $2 Come mecRE erccoe ee have different hues. The median of each replicate is 
0 | shown with a horizontal black bar; some are 
eOvvyseDeovvaend eOvysoDeovvaen eOvysoOeovved Senshi oe ay : 
eOvveneovveo eOvveteovved eOvvateovvaen invisible because they are at 0. i, j, Each x-axis 
position is labelled by Arabidopsis accession; 
e f 9 h samples from that accession are plotted above each 
oo od Jee pam, label. Each OTU in the figure has model 
amily ami ylum lias F 
50°) Comamonadaceae Bradyrhizoblaceae Cyanobacteria Flavobacteriaceae ae in several categories (Supplementary 
able 3). 
107 . 
¢ ee ’ ¢ 
5 . & & Ren» 
3 23 . ee 4 » e + 3 a e - 
ot od. oe [aed g bes FRE Le . eee 


eOvveoeovvegd eOvveoeovveo eOvvyeoeovved 
eOvveteovven eOvveneovven eOvveteovven 
i 12803 J 6115 
Famil ; Family 
Streptomycetaceae , Micromonosporaceae 
500 @ @ CLyngEC 
CL old EC 
107) . oO cle 
¥ 0 84 8 £8 v Vv CLyngR 
2 23) Book® Soe V V CLoldR 
oO ° Ro, & °° oe 
6 w % 9, a o@ © W CLlyngs 
oo oo 
m6. ave eomeS ebteweaiee B06) 6c: oO Oo CLoldS 
Al 2eme = @ «0 0 
0 


PP FAS HK Q%H" ed CAF Kolm 


Similarly, we confirmed the rare presence on the rhizoplane of 
Bradyrhizobiaceae (Supplementary Fig. 12c), a family with members 
defined by the GLMM as more abundant in Mason Farm rhizosphere 
than Mason Farm EC (Fig. 3f and Supplementary Fig. 11f). We 
enumerated the relative number of CARD-FISH signals on a set of 
filters made from equal amounts of material harvested in the same way 
as were the samples processed for pyrotag sequencing (Supplementary 
Fig. 12a, b). We confirmed that Actinobacteria were found in higher 
abundance, and that Bradyrhizobiaceae were present in lower 
abundances, in EC samples than in the bulk soil and rhizosphere 
samples. We also noted that emerging lateral roots were typically 
heavily colonized by a variety of bacteria (Supplementary Fig. 12d) 
consistent with previous observations’. These results are PCR- 
independent support for our sequencing methods. 

We present a reduced-complexity, robust experimental platform 
with which to study root microbiota. Our data, and similar conclusions 
presented in a companion publication” using a similar platform, 
provide the deepest analysis available regarding the principles of root 
microbiome assembly for any plant species. Remarkably, our conclu- 
sions are very similar to those in ref. 20 and we identify phyla and 
family level enrichments in the EC fraction that largely overlap with 
those reported in ref. 20. We note three main differences between our 
study and that of ref. 20: different soils from a different continent, a 
different primer pair and a different portion of root harvested (top 
3 cm in ref. 20; whole root here). 

A subset of the soil bacterial population is typically enriched in 
rhizosphere samples’. Thus, a diverse bacterial community can sur- 
round the root surface and thrive there, recruited by biophysical and/ 
or host-derived metabolic cues. We demonstrate that the A. thaliana 
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microbiome undergoes dramatic loss of diversity as the spatial level 
of plant-microbe ‘intimacy’ further increases from the external 
rhizosphere to the intercellular EC. Both common and soil-type- 
specific OTUs are established inside roots grown in diverse soils. A 
small number of bacterial taxa, particularly the Actinobacteria family 
Streptomycetaceae, and several Proteobacteria families, are highly 
enriched in the EC. Actinobacteria are well known for production of 
antimicrobial secondary metabolites’, and many proteobacterial 
families contain plant-growth-promoting members. Conversely, 


EUB338 


NON338 


Figure 4 | CARD-FISH confirmation of Actinobacteria on roots. A single 
set of Mason Farm yng Col-0 roots were fixed and stained using CARD-FISH. 
DAPI, 4’ ,6-diamidino-2-phenylindole. Double CARD-FISH was applied using 
the EUB338 eubacterial probe (green) and either the NON338 probe (a), which 
is the nonsense negative control of EUB338, or the HGC69a Actinobacteria 
probe (b). Inset, twofold enlargement of boxed region. Scale bars, 50 tum. 
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several taxa (Acidobacteria, Verrucomicrobia and Gemmatimonadetes, 
and various proteobacterial families) that are common in soil and 
rhizosphere are depleted from the EC. This depletion suggests that 
these taxa are either actively excluded by the host immune system, 
outcompeted by more-successful EC colonizers or metabolically unable 
to colonize the EC niche. Our identification of a limited-diversity 
EC facilitates detailed characterization of the isolates comprising the 
core A. thaliana microbiome, which could facilitate the design of 
community-based plant probiotics. 

Within the EC, we identified rare cases of quantitative variation in 
the enrichment of specific bacteria at two developmental stages or by 
different host genotypes, consistent with rare genotype-dependent 
associations noted in ref. 20. The former result suggests that the EC 
microbiome is robust to the source-sink differences across these two 
developmental stages, which may be related to the relatively high 
frequency of putative saprophytes defined in ref. 20. The latter result 
suggests that host genetic variation can drive either differential recruit- 
ment of beneficial microbes and/or differential exclusion. A limited- 
diversity EC microbiome with common features suggests similar host 
needs across A. thaliana, potentially extending to other plant taxa. 
These are probably fulfilled by contributions from a limited number 
of bacterial taxa across diverse soils. The identification of genotype- 
specific endophyte associations in particular soils may signal 
interactions that meet environment-specific host needs, balancing 
contributions of EC microbiome and host genome variation to 
overall metagenome function. These two generalities suggest that the 
A. thaliana root microbiome might assemble by core ecological 
principles similar to those shaping the mammalian microbiome, in 
which core phylum level enterotypes provide broad metabolic potential 
combined with modest levels of host-genotype-dependent associations 
that individualize the metagenome”'. Isolation and characterization 
of the microbes that define host-genotype-dependent associations, and 
characterization beyond the 16S gene, should be particularly instructive 
in unravelling the molecular rules contributing to endophytic coloniza- 
tion and persistence. 


METHODS SUMMARY 


Custom methods of soil harvesting, seed sterilization and germination were 
developed to ensure no microbial carry over during transplantation into natural 
soils. Seedling growth and harvesting conditions were developed to maximize 
consistency. PCR primers were evaluated. The JGI multiplexed 454 sequencing 
pipeline was used to derive primary data, which was processed using standard 
methods and custom scripts. Analyses were performed on simplified data sets 
defined using a GLMM with statistical corrections. In situ methods were adapted 
to observe specific microbes defined by the phylotyping pipeline. All these steps 
are detailed in Supplementary Information. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


General strategy. Seed sterility was verified by plating and deep-sequencing of 
homogenates from sterile seedlings (Supplementary Fig. 13). We established 
seedling growth, harvesting and DNA preparation pipelines as detailed in the 
specific sections below. We defined the bacterial community within each soil, 
and the community associated with plant roots across a number of controlled 
experimental variables: soil type, plant sample fraction, plant age and plant 
genotype. For plant age, we harvested roots from two developmental stages: at 
the formation of an inflorescence meristem (yng) and during fruiting when =50% 
of the rosette leaves were senescent (old). The former represents plants at the peak 
of photosynthetic conversion to carbon, whereas the latter represents a stage well 
after the source-sink shift has occurred, marking the change in carbon allocation 
from vegetal to reproductive utilization’. We prepared two microbial sample 
fractions from each individual plant: a rhizosphere (bacteria contained in the layer 
of soil covering the outer surface of the root system that could be washed from 
roots in a buffer/detergent solution), and EC (bacteria from within the plant root 
system after sonication-based removal of the rhizoplane; Supplementary Fig. 1). 
We also collected control soil samples (soil treated in parallel, but without a plant 
grown in it). 

Soil collection and analysis. For each full-factorial experiment, the top 8 in of 
earth were collected with a shovel and transported to the lab in closed plastic 
containers at room temperature from two collection sites. The first collection site, 
Mason Farm, is managed by the North Carolina Botanical Garden and is free of 
pesticide use and heavy human traffic and is located in Chapel Hill, North 
Carolina, USA (+35° 53’ 30.40’', —79° 1'5.37'’). The second collection site is 
the Central Crops Research Station in Clayton, North Carolina, USA 
(+35° 39' 59.22'’, —78° 29’ 35.69'") and is also free of pesticide use. Visible weeds, 
twigs, worms, insects and so on were removed with gloves, and the soil was then 
crushed with an aluminium mallet to a fine consistency and sifted through a sterile 
2-mm sieve. Because sieved soil from Mason Farm drained poorly and test plants 
grown in it suffered from hypoxia, we adopted the practice of mixing sterile 
(autoclaved) playground sand into both Mason Farm (MF) and Clayton (CL) soils 
at a soil:sand ratio of 2:1. Soil micronutrient analysis was performed on pure and 
2:1 mixed soils by the University of Wisconsin soil testing labs. 

Seed sterilization and germination. All seeds were surface-sterilized by a treat- 
ment of 1 min in 70% ethanol with 0.1% Triton-X100, followed by 12 min in 10% 
A-1 bleach with 0.1% Triton-X100, followed by three washes in sterile distilled 
water. Seeds were spread on 0.5% agar containing half-strength Murashige & 
Skoog (MS) vitamins and 1% sucrose. Seeds were stratified in the dark at 4°C 
for one week, then germinated at 24 °C under 18 h of light for one week. Seed coat 
sterility was confirmed by lack of visible contamination on MS plates during 
germination, and also by absence of visible contamination after plating some of 
the whole seeds on KB, 1/10-strength LB and 1/10-strength ‘869 bacterial growth 
media. 

To address whether there were seed-borne microbes that might survive surface 
sterilization, one-week-old seedlings were taken from sterile MS plates and 
homogenized by aseptic bead beating under non-bacteriolytic conditions (three 
3-mm glass balls per 2-ml tube, with 300-11 PBS, using a FastPrep from MP Bio at 
speed 4.0ms ' for 10s). The homogenate was streaked onto 1/10-strength LB, 
1/10-strength ‘869’ and KB media. No colonies were observed. To detect potential 
unculturable microbes, we pyrosequenced 16S amplicons from the same 
homogenates using bacteriolytic DNA preps from the genotypes Col-0, Cvi-0, 
Sha-0 and Tsu-0 (Supplementary Fig. 13). Each accession was individually 
barcoded and sequenced with 1114F and 1392R, yielding 21,935, 20,747, 23,141 
and 20,272 reads, respectively. A matching number of total reads was sampled 
from each accession using pooled data from the full experimental data set for 
comparative analysis. Thus, 86,095 high-quality reads were obtained from both 
non-sterile plants and sterile plants, the majority of which were chloroplast 
sequences. See Supplementary Fig. 13 for results. 

Seedling growth. One-week-old healthy seedlings were aseptically transplanted 
from MS plates to sterile (autoclaved) 2.5-inch-square pots filled with either MF or 
CL soil, with one seedling per pot. Seedlings were transferred by lifting from 
underneath the cotyledon leaves using open tweezers; no pressure was applied 
to the hypocotyl. Some pots were designated ‘bulk soil’ and were not given a plant. 
All pots, including bulk soil controls, were always watered from the top with a 
shower of distilled water (non-sterile) as an accessible proxy for rain water that 
avoids chlorine and other tapwater additives. Pots were spatially randomized and 
placed in growth chambers providing short days of 8 h light (800-1,000 Ix) at 21 °C 
and 16h dark at 18 °C. The use of short days was to help synchronize flowering 
time between A. thaliana genotypes and to facilitate robust rosette and root 
growth. After harvesting the floral transition developmental stage, remaining 
plants and bulk soils were moved from the growth chamber to 16-h days in the 
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greenhouse to promote a more synchronized flowering and senescence for the 
senescent developmental stage. 

Harvesting. Each plant was killed and harvested at one of two developmental time 
points: (1) at the floral transition and (2) after fruiting when senescence is well 
underway. We considered the floral transition to have begun when the shoot apical 
meristem was first apparent in five or more plants. Cvi-0, Sha-0 and Ct-1 
occasionally flowered one to two weeks earlier under our conditions than the other 
A. thaliana genotypes. The senescence harvest began when five or more plants 
showed 50% or more yellow and/or brown rosette leaves”; this occurred approxi- 
mately four to five weeks after transfer to the greenhouse. Senescence occurred in 
the same order as bolting (flowering). 

Our maximum harvesting and processing capacity was 30 plants per day, 
meaning that each harvesting period for each full-factorial biological replicate 
(90 pots) lasted between one and two weeks. On each harvest day, we strove to 
represent all genotypes and at least one bulk soil to avoid potential confounding 
harvesting artefacts with genotype effects. Because we harvested as many pots 
each day as time allowed, we did not always harvest in multiples of our 
genotype number and did not have equal representation of each genotype on each 
harvest day. 

The aboveground plant organs were aseptically removed. Loose soil was 
manually removed from the roots by kneading and shaking with sterile gloves 
(sprayed with 70% EtOH) and by patting roots with a sterile (flamed) metal 
spatula—this ‘neighbouring soil’ fell to the sterile (flamed) work surface. We 
followed the established convention of defining rhizosphere soil as extending up 
to 1 mm from the root surface”’ and we removed loose soil on all root surfaces until 
remaining aggregates were within this range. Roots were placed in a clean and 
sterile 50-ml tube containing 25ml phosphate buffer (per litre: 6.33g of 
NaH,PO,°H,0, 16.5g of Na,HPO,4*7H,O, 2001 Silwet L-77). Tubes were 
vortexed at maximum speed for 15 s, which released most of the rhizosphere soil 
from the roots and turned the water turbid. The turbid solution was then filtered 
through a 100-um nylon mesh cell strainer into a new 50-ml tube to remove 
broken plant parts and large sediment. The roots were transferred from the empty 
tube to a new sterile 50-ml tube with 25-ml sterile phosphate buffer, and the turbid 
filtrate was centrifuged for 15min at 3,200g to form a pellet containing fine 
sediment and microorganisms. 

Most of the supernatant was removed and the loose pellets were resuspended 
and transferred to 1.5-ml microfuge tubes, which were then spun at 10,000g for 
5min to form tight pellets, from which all supernatant was removed. These 
rhizosphere pellets, averaging 250 mg, were flash-frozen in liquid nitrogen and 
stored at —80°C until processing. The root systems, while in the 25 ml of new 
buffer, were cleaned of remaining debris with sterile tweezers and transferred to 
new sterile buffer tubes until the buffer was clear after vortexing (without major 
sediment on the tube bottom). The roots were then sonicated in a Diagenode 
Bioruptor at low frequency for 5 min (five 30-s bursts followed by five 30-s rests). 
The sonication further disrupted tiny soil aggregates and attached microbes, 
cleaning the root exterior. We opted for physical removal of surface microbes 
by sonication instead of killing them with bleach because sequencing measures 
DNA; at lower concentrations, bleach kills microbes without necessarily destroy- 
ing the DNA. Although an extended bleach treatment would also destroy 
unwanted DNA, it could also enter roots and destroy DNA of interest. 

After sonication, the roots were snap-frozen, freeze-dried to remove ice and 
then stored at —80°C until processing. Our rhizosphere and EC fractions were 
collected using time-practical protocols designed to partition sequencing-quality 
DNA and may differ slightly from classic definitions of these fractions that rely 
on partitioning culturable bacteria. We note that sonication may leave some 
rhizoplane microbes behind, especially if they are in a microniche shielded from 
the ultrasound. Such artefacts may cause our collected fractions to differ from 
theoretical definitions. 

DNA extraction. To extract DNA, the samples were resuspended in a lysis buffer 
and microbial cells were mechanically lysed through bead beating. For all bulk soil 
and rhizosphere data, bead beating and purification were performed with the 
MoBio PowerSoil kit (SDS/mechanical lysis) because of its unmatched ability to 
remove humics and other PCR inhibitors in our soil. EC DNA from Arabidopsis 
experiments was prepared with the MP Bio Fast DNA Spin Kit for soil (also a SDS/ 
mechanical lysis) because the more intense bead-beating protocol and lysis matrix 
gave improved lysis of whole roots and higher DNA yield, and soil PCR inhibitors 
were less of a problem with these samples. Our procedure yielded around 1 j1g of 
DNA per rhizosphere sample, and more total DNA for EC samples (although a 
significant portion of EC DNA sequenced was of host origin). Although MoBio 
Powersoil and MP Bio Fast DNA use highly similar bead-beating/mechanical lysis 
methods, we developed a custom method of sample pre-homogenization that 
allowed us to prepare some EC samples using the MoBio kit. A comparison of 
Col-0 fractions soil, rhizosphere and EC across four soil digs of MF, where EC was 
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prepared using MoBio in two digs and MP Bio in the other two digs, shows that 
although we cannot rule out a slight kit effect, both kits produce highly similar 
clustering separating EC from rhizosphere and soil fractions (Supplementary 
Fig. 6, replicates 3 and 4). DNA quantity was assessed with the Quant-iT 
PicoGreen dsDNA Assay Kit (Invitrogen) and a plate fluorospectrometer. 

PCR. For each 1114F-barcoded 1392R primer set, PCR reactions with ~10 ng of 
template were performed in triplicate along with a negative control to reveal 
contamination. The PCR program used was 95 °C for 3 min followed by 30 cycles 
each of 95°C for 30s, 55°C for 45s and 72°C for 1 min, followed by 72 °C for 
10 minand then cooling to 16 °C. We first verified that the no-template control did 
not contain DNA via gel electrophoresis, and then pooled the three replicate PCR 
products and quantified DNA from each pool with PicoGreen (Invitrogen). 
Pooled PCR products from 30-48 barcoded samples were then combined in 
equimolar ratios into a master DNA pool, which was cleaned with Mo-Bio 
UltraClean PCR Clean-Up kit before submission for standard JGI pyrosequencing 
using a half-plate of Roche 454-FLX with titanium reagents. 

454 pyrotag sequencing. To identify organisms present in each sample, 454 
sequencing of the SSU rRNA genes was performed. For 454 sequencing, the 
SSU rRNA genes present in each sample were amplified with the primers 1114F 
and 1392R containing the 454 adaptors”. Each sample was assigned a reverse 
primer with a unique 5-bp barcode, allowing 30-48 samples to be pooled per half- 
plate. In preparation for sequencing, working aliquots of the master pool were 
immobilized on beads and amplified by emulsion PCR, the emulsion was broken 
with isopropanol, DNA-carrying beads were enriched and the enriched beads 
were loaded on the instrument for sequencing. During the emPCR protocol, we 
reduced the amplification primer amount from 460 ,l in the standard protocol to 
58 pl per emulsion cup. This is the same amount of primer used for the paired-end 
emPCR protocol. One-and-three-quarter million beads were loaded in each plate 
region (reduced from 2,000,000 beads per region in the standard protocol). A 
detailed standard protocol is available on request. 

Primer test and technical reproducibility. We first tested three sets of broad- 
specificity 16S rRNA 5’ primers* (Supplementary Fig. 2a,b) and established 
technical reproducibility metrics. We used 13 samples chosen from each of the 
three sample fractions (soil, rhizosphere and EC) and both soil types (MF and CL) 
(Supplementary Fig. 2c). Each sample was amplified individually with each of the 
forward primers (804F, which broadly targets bacteria and archaea; 926F, a 
universal primer; and 1114F, which broadly targets bacteria), paired with the 
barcoded universal reverse primer (1392R) and sequenced twice to measure 
technical reproducibility. We identified bacteria by grouping highly similar (97% 
identity) sequences into OTUs (Supplementary Methods). We chose 1114F for our 
experiments, on the basis of its broad coverage of the bacterial domain” and higher 
usable data yield (Supplementary Fig. 2f-i and Supplementary Fig. 10). 

We identified bacteria present by grouping highly similar (97% identity) 
sequences into OTUs using a standard QUIME (quantitative insights into micro- 
bial ecology)-based pipeline’ with default settings; thus, this stand-alone test con- 
sists of a different set of OTUs than those described in this work. The primer test 
samples are included in our submitted data and are found on 454 half-plates 26b 
and 27a. The progressive drop-out analysis, displaying the coefficient of deter- 
mination (R’) of the least-squares regression between the two technical replicates 
as low-abundance OTUs are sequentially discarded, was calculated using the 
software R with a custom script. 

Primer specificity sequence. 804F prokaryote: 5'-agattagatacccdrgtagt-3’. 

926F universal: 5’-actcaaaggaattgacgg-3’. 

1114F bacteria: 5'-gcaacgagcegcaaccc-3’. 

1392R barcoded universal: 5'-XXXXXacggecggtetgtrc-3’. 

Sequence processing pipeline and assignment of OTUs. As each 454 plate was 
sequenced, raw reads from individual plates were immediately run through 
PYROTAGGER” to diagnose plate quality so that plates could be re-queued if 
necessary. Plates with a reasonable number of long, high-quality raw reads with 
matching barcodes were used in the final analysis of OTU picking and taxonomy 
assignment. Using QUME-1.4.0°’, short reads were removed and the remaining 
reads were trimmed to 220bp, and low-quality reads were removed from the 
analysis using default quality settings (http://qiime.org/scripts/split_libraries. 
html). These high-quality sequences were clustered into OTUs using a custom 
script derived from otupipe (http://drive5.com/otupipe). The three main steps 
used from otupipe include (1) de-replicating sequences to reduce the size of the 
data set and the run time of clustering analysis, (2) de-noising sequences by 
forming clusters of 97% identity and representing these with the consensus 
sequence, and (3) forming OTUs by clustering de-noised consensus sequences 
at 97% identity. 

The consensus sequence of sequences in each OTU was used as a representative 
sequence. Each representative sequence was assigned a taxonomy by two methods: 
(1) using the RDP classifier” trained on the 4 February 2011 Greengenes reference 


sequences and (2) by assigning the Greengenes”' taxonomy of the best BLAST hit 
within a combined database including the complete Greengenes 16S database and 
18S A. thaliana sequences from NCBI. By the BLAST-based method, sequences 
without a hit below the E-value threshold of 0.001 are considered unclassified. 

Once OTUs were assigned a taxonomy, all OTUs annotated as chloroplasts, 
Viridiplantae or Archaea by any of the methods were removed from the OTU 
table, resulting in the set of usable OTUs. 

We pooled usable reads from each bulk soil and rarefied to 200,000 reads per 
soil; this was permuted 100 times. We observed a median of 9,709 OTUs in MF soil 
and 9,897 OTUs in CL soil. Rarefaction curves to 200,000 reads in each bulk soil 
(not shown) indicated that, even at 200,000 reads, we were not capturing the entire 
community in either soil. Consequently, the total number of OTUs we report for 
our bulk soils may be lower than that found in some reports aimed at finding the 
true microbial diversity in soils. 

A handful of samples had been sequenced more than once, over more than one 
454 half-plate (for example to increase the read depth from problematic samples). 
These duplicated samples were pooled into a single sample by adding the 
unnormalized counts in the OTU table, and the resulting column was renamed 
to reflect the pooling that took place. Next any sample that had fewer than 50 
usable reads was discarded, resulting in the unnormalized usable OTU table. At 
this point, both a frequency table and a rarefied table (1,000 usable reads per 
sample) were created as alternative normalization techniques. 

The frequency table was made from the unnormalized usable OTU table by 
dividing the number of reads for each OTU in a given sample by the total number 
of reads in that sample and multiplying by 100, and repeating this across all samples. 

Wealso created a rarefied table; because some samples, particularly samples from 
the EC, had fewer than 1,000 usable reads in the unnormalized usable OTU table, 
counts from independent samples sharing the same soil type, genotype, fraction, age 
and experiment were pooled to make groups of at least 1,000 reads, and the sample 
names were changed to reflect the pooling that had taken place 
(Rarefaction_MappingFile... in Supplementary Database 1). Then all samples were 
rarefied to 1,000 counts using the rrarefy() function in the vegan package of R*. 

We present both methods because each has advantages and limitations. The 
advantage of the frequency table is that it keeps each individual plant separate, 
contains more individual samples and uses all of the data, but this comes at the cost 
of increased granularity in the normalized relative abundance percentages for 
some of the samples with fewer reads, causing problems with direct comparability. 
The major advantage of the rarefied table is that comparisons are not biased by 
sampling depth and all read counts have equal weight, but this comes at the cost of 
reduced sample number and samples that mix information from several replicated 
individuals because we needed to pool some of our samples to meet our rarefaction 
threshold, and also at the cost of higher overall granularity because we discarded 
many reads from more deeply sequenced samples. 

Because the majority of OTUs were represented by a very small number of reads 

and these OTUs were not technically reproducible (Supplementary Fig. 2d, e), 
both the rarefaction-normalized and the frequency-normalized OTU tables were 
thresholded to generate measurable OTUs for the majority of analyses (the major 
exception being the UniFrac analysis in Fig. 1: weighted UniFrac distance is robust 
to rare OTUs). An OTU was deemed measurable if and only if there were =25 
reads in =5 samples in the unnormalized usable OTU table. As described in the 
text and Supplementary Fig. 2, this threshold was derived from the fact that the 
correlation between abundance in the same OTU in technical replicates improved 
greatly as OTUs approached an abundance of 25 reads, and from the fact that 
although contamination might create an OTU at this abundance once, the 
probability of an OTU being spurious decreases greatly if it occurs at a measurable 
level in several (we chose =5) independent samples. 
Detection of differentially enriched OTUs by the GLMM. The OTU 
abundances were analysed with a GLMM to estimate the effect of the different 
variables on each measurable OTU. The Ime4 R package” was used to fit the 
model. The abundance of each OTU on each sample (y;;) was log-transformed 
and modelled as a function of the abundance of the same OTU in bulk soil samples 
(std_check) as a fixed effect, and plant genotype (b,), sample type (plant or bulk 
soil, bz), plant developmental stage (b3), soil type (b4), sequencing half-plate (bs) 
and biological replicate (bs) were modelled as random effects. The full model is 
specified by 


dig = B x std_check + bi + boii + bai + bai + bsij + beij + ei 


where ey is the residual error and std_check was calculated as the mean abundance 
of each OTU in all the bulk soil samples from each combination of experiment and 
developmental stage. 

There were not enough paired samples of rhizosphere and EC from the same 
individual plant to model the effect of both fractions directly. Instead, the 
abundance table was split into EC and rhizosphere samples, and the effect of each 
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fraction with respect to bulk soil controls was estimated. The same model spe- 
cification was used independently on both fractions, and for both the frequency 
and the rarefied tables (see Supplementary Methods on sequence processing 
pipeline). The percentage of total variance explained by each random variable 
on the OTU abundances is reported in Supplementary Table 5. 

For each level of the random effects, the conditional mode and 95% prediction 
interval were estimated by Markov chain Monte Carlo sampling from the fitted 
model. A specific level is considered to have an effect on an OTU if the prediction 
interval of its conditional mode does not include zero. OTUs detected this way are 
reported in Supplementary Database 3. 

Partial GLMM. There were not enough samples to estimate all the interaction 
effect between all variables without drastically reducing the size of the data set and 
our statistical power (Supplementary Table 2). To assess specific interactions of the 
genotype effect with other variables, a constrained version of the previously 
defined GLMM was used that employed only the fixed effect (std_check) and 
the random effects for plant genotype (b,) and sample type (b2). Samples were 
split into groups of the same experiment, developmental stage and fraction (thus, 
all the other variables from the full model are tested within each group), and the 
model was fitted and analysed in the same way as the full GLMM. A non- 
parametric Kruskal-Wallis test was used to verify independently the predictions 
of the partial GLMM for significance, where P values were corrected to Q values 
using the Benjimani-Hochberg FDR method; predictions from each partial 
GLMM with a Q value >0.05 were discarded as insignificant. The intersection 
of the significant genotype predictions between both biological replicates of each 
condition was calculated. The intersection analysis from the partial GLMM is 
displayed in Supplementary Table 3. 

Scanning electron microscopy sample preparation. Arabidopsis roots were fixed 
in 2% paraformaldehyde, 2.5% glutaraldehyde and 0.15M sodium phosphate 
buffer, pH 7.4. The samples were dehydrated using a gradual ethanol series 
(30%, 50%, 75%, 100%, 100%) and dried in a Samdri-795 supercritical dryer using 
carbon dioxide as the transitional solvent (Tousimis Research Corporation). Roots 
were mounted on aluminium planchets with double-sided carbon adhesive and 
coated with 10nm of gold—palladium alloy (60:40 Au:Pd, Hummer X Sputter 
Coater, Anatech USA). Images were made using a Zeiss Supra 25 FESEM 
operating at 5kV and a working distance of 5mm, and with a 10-1m aperture 
(Carl Zeiss SMT Inc.), at the Microscopy Services Laboratory, Pathology and 
Laboratory Medicine, UNC at Chapel Hill. 

Log, transformation. All log, transformations on OTU tables followed the 
formula log>(1000x + 1), where x is the rarefied read counts (or frequency) per 
OTU. 

Heat maps. Heat maps were constructed using custom scripts and the function 
heatmap.2 from the R package gplots™*. For better visualization, all data was log)- 
transformed. Hierarchical clustering of rows and columns in the heat maps is 
based on Bray-Curtis similarities and uses group-average linkage. 

Diversity. The Shannon diversity index and the non-parametric Chaol diversity 
were calculated with the vegan package in R*. The exponential function was 
applied to the Shannon diversity index to calculate the true Shannon diversity 
(effective number of species). 

Rarefaction curves. Rarefaction curves were made with custom scripts that 
sampled each sample fraction only once at each read depth. To reveal the variance 
in sampling, no attempt was made to smooth the curves by taking the average of 
repeated samplings. 

Taxonomy histograms and statistics. Taxonomy histograms were created using 
custom scripts and visualized in GraphPad PRISM version 5.0 for Windows” 
(GraphPad Software, Inc, http://www.graphpad.com). The ‘low-abundance’ 
category was created to help remove visual clutter, and contained any taxonomic 
group that did not reach at least 5% in any one fraction. The Shannon diversity 
index was calculated as described above. Differences in distribution at varying 
taxonomic levels, and differences in Shannon diversity between soil, rhizosphere 
and EC fractions, were tested by weighted analysis of variance (to account for 
differing numbers of soil, rhizosphere and EC samples), invoking the central limit 
theorem (>60 samples in each group in all tests for both frequency-normalized 
and rarefaction-normalized tests). For more details about tests, see additional 
notation in Supplementary Table 5. 

Sample clustering using UniFrac. A phylogenetic tree was built with the 
representative sequence for each OTU and the pairwise, normalized, weighted 
UniFrac distance*®. For UniFrac, representative sequences from all non-plant 
OTUs, including those that did not meet the 25 X 5 sample threshold, were con- 
sidered. UniFrac distances between samples are based on the fraction of branch 
length that is unique to each sample in a shared phylogenetic tree composed of 
OTU representative sequences from all samples. Thus, samples containing OTUs 
of highly divergent sequences will be more distant from each other, because the 
OTUs comprising each sample will occupy different major branches on the shared 
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phylogenetic tree of OTUs, whereas samples containing highly similar OTUs will 
share these major branches. In weighted UniFrac, the branch length unique to 
each sample is multiplied by the frequency at which that OTU occurs in the 
sample. Thus, weighted UniFrac can detect differences between two samples that 
have the same set of OTUs that differ quantitatively between the samples. 

Principal coordinate analysis was performed using pairwise, normalized, 

weighted UniFrac distances between all samples on the unthresholded but 
normalized OTU tables, and the first two principal coordinates of UniFrac were 
visualized with GraphPad PRISM version 5.0 for Windows. 
CARD-FISH application to roots. We applied a modified protocol described 
previously”’. Briefly, several root systems from a bolting Col-0 grown in MF were 
fixed using 4% formaldehyde in PBS at 4 °C for 3 h, washed twice in PBS and stored 
in 1:1 PBS:molecular-grade ethanol at —20 °C. Treatments with lysozyme solution 
(1h at 37°C, 10mg ml !; Fluka) and achromopeptidase (30min at 37°C, 
60 U ml '; Sigma) were sequentially used for prokaryotic cell-wall permeabiliza- 
tion. Endogenous peroxidases were inactivated with methanol treatment amended 
by 0.15% H,0, at room temperature for 30 min and washed again. Probes targeting 
either the 16S or the 23S rRNA (EUB338 (5'-GCTGCCTCCCGTAGGAGT-3’, 35% 
formamide), NON338 (5'-ACTCCTACGGGAGGCAGC-3’, 30% formamide), 
HGC69a (5'-TATAGTTACCACCGCCGT-3’, 25% formamide) and Brady4 
(5'-CGTCATTATCTTCCCGCACA-3’, 30% formamide)) were defined using 
probeBase** (http://www.microbial-ecology.net/default.asp), labelled with enzyme 
horseradish peroxidase on the 5’ end (Invitrogen), diluted in hybridization buffer 
(final concentration of 0.19 ng ml’) with each probe’s optimum formamide con- 
centration, and hybridized at 35°C for 2h. Unbound probes were washed away 
from samples in wash buffer (NaCl content adjusted according to the formamide 
concentration in the hybridization buffer) at 37°C for 30min. Fluorescently 
labelled tyramide was used for signal amplification, and samples were washed 
before mounting on glass slides. 

For double CARD-FISH, a subset of samples went through a second round of the 
protocol, starting at the peroxidase inhibition with a second variety of fluorescently 
labelled tyramide used to be able to distinguish the signals from each probe. Roots 
were mounted on glass slides using Vectashield with DAPI (Vector Laboratories, 
catalogue no. H-1200) for mounting solution, and sealed with nail polish for storage. 
All microscopy images were made on a confocal laser scanning microscope (Zeiss 
LSM 710 META) located in the Biology Department at UNC. The Brady4 probe, 
which has not been used for this application previously, was tested on filters of 
cultured Bradyrhizobiaceae and three negative control cultured strains to determine 
the most specific formamide concentration in the hybridization buffer. 

For application of samples onto filters, bulk MF soil, rhizosphere and EC 

samples from four sets of Col-0 roots were pooled and harvested in the way 
described above before DNA extraction. Samples were then fixed as described 
above and passed through a 10-1m filter. The concentrations of plant material 
were made equal and samples were sonicated in a water bath for 5 min. The sample 
suspension was further diluted to 1:500 in water and applied to a 25-mm poly- 
carbonate filter with a pore size of 0.2 um (Millipore) using a vacuum microfiltra- 
tion assembly. Filters were embedded in 0.2%, low-melting-point agarose and 
dried, and CARD-FISH was applied as described above. For quantification of 
bacteria, filters were visualized on a Nikon Eclipse E800 epifluorescence micro- 
scope. Positive EUB338 probe signals that co-localized with a DAPI signal were 
counted as Eubacteria. Positive Actinobacteria or Bradyrhizobiaceae signals were 
counted as positive when the HGC69a or Brady4 probe co-localized with both 
EUB338 and the DAPI signal. 
Sample naming in OTU tables. All sample names in OTU tables are in the 
following form: [soil type].[genotype].[sample number] [fraction].[age].[experi- 
ment]_[plate]. For example, M21.Col.6E.old.M1_2b should be interpreted as [soil 
type] = M21 = Mason Farm 2:1, [genotype] = Col = Col-0, [sample number] = 6, 
[fraction] = E= endophyte compartment, [age] = old, [experiment] = M1 = 
Mason Farm replicate 1, [plate] = 2b. 
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A mutation in APP protects against Alzheimer’s 
disease and age-related cognitive decline 
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Hreinn Stefansson', Patrick Sulem!, Daniel Gudbjartsson', Janice Maloney’, Kwame Hoyte?, Amy Gustafson, Yichin Liu’, 
Yanmei Lu’, Tushar Bhangale?, Robert R. Graham”, Johanna Huttenlocher’*, Gyda Bjornsdottir', Ole A. Andreassen’, 
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The prevalence of dementia in the Western world in people over the 
age of 60 has been estimated to be greater than 5%, about two- 
thirds of which are due to Alzheimer’s disease’ *. The age-specific 
prevalence of Alzheimer’s disease nearly doubles every 5 years after 
age 65, leading to a prevalence of greater than 25% in those over the 
age of 90 (ref. 3). Here, to search for low-frequency variants in the 
amyloid-B precursor protein (APP) gene with a significant effect 
on the risk of Alzheimer’s disease, we studied coding variants in 
APPinaset of whole-genome sequence data from 1,795 Icelanders. 
We found a coding mutation (A673T) in the APP gene that 
protects against Alzheimer’s disease and cognitive decline in the 
elderly without Alzheimer’s disease. This substitution is adjacent 
to the aspartyl protease p-site in APP, and results in an approxi- 
mately 40% reduction in the formation of amyloidogenic peptides 
in vitro. The strong protective effect of the A673T substitution 
against Alzheimer’s disease provides proof of principle for the 
hypothesis that reducing the f-cleavage of APP may protect against 
the disease. Furthermore, as the A673T allele also protects against 
cognitive decline in the elderly without Alzheimer’s disease, the 
two may be mediated through the same or similar mechanisms. 

Amyloid plaques are a central pathological feature of Alzheimer’s 
disease and largely consist of amyloid-B peptides*®. Amyloid-B is 
formed through sequential proteolytic processing of APP, catalysed 
by the B- and y-secretases’. The aspartyl protease B-site APP cleaving 
enzyme 1 (BACE]), originally identified over a decade ago*’, cleaves 
APP predominantly at a unique site, whereas the y-secretase complex 
cleaves the resulting carboxy-terminal fragment at several sites, with 
preference for positions 40 and 42, leading to formation of amyloid- 
Bi_-40 (AB1_-40) and AB, _42 peptides’. Alternative processing of APP at 
the x-site prevents the formation of amyloid-f, as the a-site is located 
within amyloid-p. 

Over 30 coding mutations in the APP gene have been found. About 
25 of these are pathogenic, in most cases resulting in autosomal 
dominant Alzheimer’s disease with an early onset'”’’. Substitutions 
at or near the B- and y-proteolytic sites appear to result in overproduc- 
tion of either total amyloid-B or a shift in the ABj_49:ABj_4 ratio 


Table 1 | APP A673T protects against Alzheimer’s disease 


towards formation of the more toxic AB, _42 peptide, whereas substitu- 
tions within the amyloid-f peptide are believed to result in formation 
of amyloid-B with increased propensity for aggregation”. 

Until now, mutations in APP have not been implicated in the 
common, late-onset form of Alzheimer’s disease, with the exception 
of the rare variant, N660Y, which was recently identified in one case 
from a late-onset Alzheimer’s disease family’’. To search for low- 
frequency variants in the APP gene with a significant effect on the risk 
of Alzheimer’s disease, we tabulated coding variants in APP in a set of 
whole-genome sequence data from 1,795 Icelanders. Variants present 
in more than one individual were subsequently imputed into 71,743 
chip-typed Icelanders using long-range phasing information'®"’, 
followed by propagation of genotypes and generation of in silico 
genotypes for 296,496 close relatives of chip-typed individuals who 
had not been genotyped”. 

We then investigated the association of the variants in APP with 
Alzheimer’s disease (Supplementary Table 1). The control group 
included individuals who had lived to at least age 85 without a diagnosis 
of Alzheimer’s disease. The most significant association was found with 
1863750847. The A allele of this single nucleotide polymorphism (SNP) 
(1s63750847-A) results in an alanine to threonine substitution at posi- 
tion 673 in APP (A673T), and was found to be significantly more 
common in the elderly control group than in the Alzheimer’s disease 
group (0.62% versus 0.13%; odds ratio (OR) =5.29; P value= 
4.78 X10 7; Table 1), and is therefore protective against Alzheimer’s 
disease. To confirm these results, we performed Sanger sequencing of 
1s63750847 in 451 predicted carriers of rs63750847-A, including two 
predicted homozygotes. All the predicted carriers were found to have 
the correct copy number of rs63750847-A, confirming the results 
obtained with imputation. We also confirmed results by genotyping 
1863750847 in 3,661 individuals (cases and controls), and found one 
mismatch (0.027%; Supplementary Information). Previously, the 
rs63750847 variant had been reported in a single individual without 
a history of Alzheimer’s disease” and in one affected member of a 
family with late-onset Alzheimer’s disease, but was deemed to be 
probably non-pathogenic’’. We found the variant in 3 out of 712 


Analysis 1/0R OR P value Controls 

Frequency (%) Nehip Nin silico 
AD - - - 0.13 2,199 849 
AD versus population controls 4.24 0.236 419x105 0.45 57,174 22,074 
AD versus population controls aged 85 or greater 5.29 0.189 478x107 0.62 7,653 1,350 
AD versus cognitively intact controls at age 85 752 0.133 6.92 x10 © 0.79 827 407 


The table shows association results, comparing patients with Alzheimer’s disease (AD) to three different control groups (top line gives numbers for patients with Alzheimer’s disease only). Nchip, nUMber of 
individuals with chip-based genotype information; Nin silico, PUMber of individuals with genealogy-based genotype information. 
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Norwegian (0.21% allelic frequency), 4 out of 390 Finnish (0.51% allelic 
frequency) and 5 out of 590 Swedish (0.42% allelic frequency) samples. 
The variant was also observed in 1 out of 7,020 chromosomes by the 
National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing 
Project*', and in 3 out of 31,714 chromosomes from a North American 
population using an exome SNP chip array (see Supplementary 
Information). 

The effect of rs63750847-A is stronger when using elderly controls 
than when using general population controls (Table 1), which is a 
consequence of a greater frequency of the variant in the elderly. We 
estimate that the odds for carriers of rs63750847-A of reaching age 85 
are 1.47-fold the odds of non-carriers. 

As Alzheimer’s disease is a common disease with a late onset, it is 
informative in association studies to use a control group that includes 
those who have reached old age without deficits in cognition. Therefore, 
we examined the frequency of rs63750847 in a control group of indi- 
viduals who were cognitively intact at age 85, based on a score of 0 on 
the Cognitive Performance Scale (CPS), a seven-category hierarchical 
scale assessing cognitive function in the elderly (Supplementary 
Information). We found an enrichment (0.79%; OR=7.52, 
P=6.92 X10 °; Table 1) of rs63750847-A in this group, consistent 
with a protective effect of rs63750847-A against Alzheimer’s disease. 

To study further the effect of the A673T substitution on cognitive 
decline in the elderly, we investigated cognitive function as measured 
with CPS in 41 carriers of A673T in the age range 80-100 as well as in 
3,673 non-carriers. The Resident Assessment Instrument for Nursing 
Homes (RAI-NH), on which the CPS score is based, is applied on 
average three times per year in Icelandic Nursing Homes. Because 
the residency time in nursing homes in Iceland is on average 3-4 years, 
many determinations of CPS made at different times are available for 
most individuals (Supplementary Fig. 1). As expected, cognitive func- 
tion declines slowly but steadily with age, both in carriers and non- 
carriers of A673T (Fig. 1). Analysing a total of 23,831 CPS scores for 
the 3,673 non-carriers of A673T without a diagnosis of Alzheimer’s 
disease (average of 6.49 determinations per individual), and 262 CPS 
scores for the 41 carriers of A673T without a diagnosis of Alzheimer’s 
disease (average of 6.39 determinations), we found on average a 1.03 
unit difference between carriers and non-carriers across the 80-100 


T T T T T 
80 85 90 95 100 


Age (years) 


Figure 1 | Cognition measured by CPS as a function of age. Shown are CPS 
scores of carriers (red symbols) and non-carriers (blue symbols) of A673T as a 
function of age. Each symbol represents the average CPS score of individuals at 
the respective age (in years). Error bars represent + 1 standard error. The 
jagged appearance of the graph for A673T carriers is due to the relatively small 
number of data points (262 in total, representing 41 individuals, as compared to 
23,831 data points representing 3,673 A673T non-carriers). Individuals with a 
diagnosis of Alzheimer’s disease were not included in the analysis. 
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age range (Fig. 1; P = 0.0021), with the carriers having a score indi- 
cative of better conserved cognition. The fact that the cognitive func- 
tion of non-carriers remained poorer than for carriers of A673T after 
removing known Alzheimer’s disease cases suggests that the protective 
effect of A673T extends beyond the boundaries of the Alzheimer’s 
disease phenotype. 

The A673T substitution is located at position 2 in the amyloid-f 
peptide. Recently, an alanine to valine substitution at position 673 in 
APP (A673V) was reported as being recessive for Alzheimer’s disease 
with very early onset in a single Italian pedigree”. Heterozygous 
carriers of A673V in this pedigree were unaffected. We identified three 
homozygous carriers of A673T in Icelandic samples, one of whom had 
died at age 88, with the other two currently living at age 67 and 83, 
respectively. None of these homozygous carriers had a history of 
dementia. 

Our genetic data indicate that the A673T substitution in APP is 
protective against Alzheimer’s disease. The proximity of A673T to 
the proteolytic site of BACE1 suggested to us that the variant might 
result in impaired BACE1 cleavage of APP in the A673T carriers. 

To investigate the effect of A673T on proteolytic processing of APP, 
we followed the formation of extracellular APP fragments generated 
by APP processing at the B-site (SAPP) and «-site (sAPPx), respect- 
ively, as well as production of the amyloidogenic peptides AB, 49 and 
ABx-42, in 293T cells transfected with wild-type or mutant APP 
(Fig. 2). By western blot analysis of cell supernatants (Fig. 2a), we 
found that the A673T variant results in reduced production of 
sAPPB with a slight apparent increase in production of sAPPa as 
compared to wild-type APP. We next confirmed these observations 
using a quantitative sandwich immunoassay approach (Fig. 2b). 
sAPPB production from A673T was ~50% less than from wild-type 
APP, whereas sAPPo trended non-significantly towards an increase. 
We also found that the production of both amyloidogenic peptides 
ABx-49 and AB, 49 was ~40% less by the A673T variant than by wild- 
type APP (Fig. 2c, d). For comparison, we also analysed APP cleavage 
by the pathogenic A673V variant, which has previously been found to 
increase amyloidogenic processing of APP”. In contrast to A673T, the 
A673V substitution resulted in markedly increased APP processing at 
the B-site (Fig. 2a, b), decreased processing at the -site (Fig. 2a, b), and 
greatly enhanced AB,_49 and AB,_42 production (Fig. 2c, d). For further 
reference, we also looked at AB, 49 and AB,42 production by APP 
K670N/M671L, which has been reported to increase AB,_49 and AB, 49 
production. We confirmed that neither the A673T nor A673V substi- 
tution interfered with detection in the enzyme-linked immunosorbent 
assay (ELISA) (Supplementary Information). The change in the various 
APP cleavage products seen with A673T shows that this substitution 
reduces BACE]1 cleavage of APP relative to wild-type APP, whereas 
A673V and K670N/M671L both markedly increase APP cleavage 
(Table 2). These results are consistent with the protective effect of 
A673T against Alzheimer’s disease, as well as the dramatic phenotypic 
contrast between T and V substitution at the 673 site in APP. These data 
also illustrate clearly that position 673 of APP is capable of regulating 
proteolytic processing by BACE1. 

To confirm these observations, we used an in vitro BACE1 cleavage 
assay to assess processing ofa wild-type synthetic APP peptide substrate 
compared to a peptide bearing the A673T substitution. The A673T APP 
peptide was processed ~50% less efficiently than the wild-type sub- 
strate, supporting the conclusion that it codes a sub-optimal BACE1 
cleavage site (see Supplementary Information). The substrate specificity 
of BACEI has previously been investigated in synthetic model peptides, 
showing that amino acid substitutions at position 673 in APP can be 
tolerated’. Interestingly, although wild-type APP seems to be a rela- 
tively poor substrate for BACE1, most substitutions near the B-cleavage 
site result in an increased rate of cleavage of synthetic peptides”. 
However, consistent with our findings, a threonine substitution at 
position 673 of these APP peptide substrates leads to BACE1 cleavage 
rates that are 50-fold less than for a valine substitution at the same 
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Figure 2 | A673T reduces BACEI cleavage of APP. a, Western blotanalysis of 
293T cells transfected with wild-type (WT), A673T, A673V or K670N/M671L 
APP compared to GFP. Total cellular APP was compared to sAPPB and sAPPa 
from cell supernatants. Note that sA PP is not detected from the K670N/M671L 
APP transfection as these mutations alter the epitope recognized by the 
anti-sAPPB antibody. b, Immunoassay quantification of sAPPB and sAPPa 
supernatants. c, d, ELISA quantification of AB,_4 (c) and AB,_49 (d) production 
from the same 293T transfected cells. *P = 0.01, **P = 0.005, ***P = 0.001 
(two-tailed t-test, compared to wild-type APP); values represent mean = s.d. of 
three replicates. The experiment was repeated independently three times. 


position”’. These data further support the conclusion that the A673T 
substitution in APP reduces BACE] cleavage relative to wild-type APP 
substrates. 
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Table 2 | APP cleavage products from transfected 293T cells 


Wild type A673T A673V K670N/M671L 
sAPPB 17.8429 8.1+0.7 51.5+44 N/A 
sAPPa 527+18 564+ 21 47642 518+ 33 
ABx-40 2940.2 1.5+0.1 10.4+0.7 414+5.1 
ABx-42 0.25 + 0.02 0.14 + 0.02 0.72 + 0.08 3.11 +0.69 


All reported values are in ngml‘. APP cleavage products were quantified from supernatants from 
293T cells transfected with wild-type, A673T, A673V or KO70N/M671L APP. Values represent 
mean + s.d. of three replicates from a single experiment. N/A, not applicable. 


Our data show that position 673 of APP is critical for amyloidogenic 
processing of APP by BACE1. To our knowledge, A673T represents 
the first example of a sequence variant conferring strong protection 
against Alzheimer’s disease. The strong protective effect of A673T also 
provides further proof of principle for the idea that reducing BACE1 
cleavage of APP may protect against Alzheimer’s disease. Furthermore, 
the fact that the A673T substitution also protects against cognitive 
decline in the elderly without Alzheimer’s disease provides indirect 
support for the hypothesis that the pathogenesis of Alzheimer’s disease 
and normal cognitive decline of the elderly may be shared, at least in 
part. We therefore propose that Alzheimer’s disease may represent the 
extreme of the age-related decline in cognitive function. 


METHODS SUMMARY 


Patients with Alzheimer’s disease were enrolled through the Memory Clinic at 
Landspitali University Hospital. Diagnosis of Alzheimer’s disease was established 
according to NINCDS-ADRDA criteria or according to International Classification 
of Diseases, 10th revision (ICD-10) code F00 criteria. Cognitive function was 
assessed using the CPS, which is based on the Minimum Data Set for Nursing 
Homes, MDS 2.0, of the RAI by InterRAI’’. 

Genotype data for A673T (1s63750847) were based on whole-genome sequence 
data generated from 1,795 Icelanders to a depth of at least X10. Approximately 
30 million markers (SNPs and indels) were imputed based on this set of indivi- 
duals. Sequencing by synthesis was performed on Illumina GAIIx and HiSeq2000 
instruments using previously described methods'*. Long-range phasing of all chip- 
genotyped individuals was performed using previously described methods’>”*. 
SNPs that were identified and genotyped through sequencing were imputed into 
all Icelanders who had been phased with long-range phasing using the same model 
used by IMPUTE”. Generation of in silico genotypes was performed by imputing 
genotypes into relatives of chip-genotyped individuals, using the fully phased 
imputed and chip-type genotypes of the available chip-typed individuals. 
Association testing was performed using logistic regression, matching controls 
to cases based on the informativeness of the imputed genotypes. Chip-typed 
samples were assayed with Illumina bead chips containing from 300,000 to 
2,500,000 SNPs. SNPs that did not pass a rigorous quality control test were 
excluded. All samples with a call rate below 97% were also excluded. 

Human APP695 cDNA was cloned into pRK vector and mutagenized using 
QuickChange site-directed mutagenesis kit (Stratagene), followed by transfection 
into 293T cells. APP cleavage products were assessed both by western blots and 
immunoassays. AB, 49 and AB, 45 peptides were measured from cell supernatants 
with sandwich ELISAs. 

For further details, see Supplementary Information. 
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Medulloblastoma is an aggressively growing tumour, arising in 
the cerebellum or medulla/brain stem. It is the most common 
malignant brain tumour in children, and shows tremendous bio- 
logical and clinical heterogeneity’. Despite recent treatment 
advances, approximately 40% of children experience tumour 
recurrence, and 30% will die from their disease. Those who survive 
often have a significantly reduced quality of life. Four tumour 
subgroups with distinct clinical, biological and genetic profiles 
are currently identified*’. WNT tumours, showing activated 
wingless pathway signalling, carry a favourable prognosis under 
current treatment regimens*. SHH tumours show hedgehog 
pathway activation, and have an intermediate prognosis’. Group 3 
and 4 tumours are molecularly less well characterized, and also 
present the greatest clinical challenges***. The full repertoire of 
genetic events driving this distinction, however, remains unclear. 
Here we describe an integrative deep-sequencing analysis of 125 
tumour-normal pairs, conducted as part of the International 
Cancer Genome Consortium (ICGC) PedBrain Tumor Project. 
Tetraploidy was identified as a frequent early event in Group 3 
and 4 tumours, and a positive correlation between patient age 
and mutation rate was observed. Several recurrent mutations 
were identified, both in known medulloblastoma-related genes 
(CTNNB1, PTCH1, MLL2, SMARCA4) and in genes not previously 
linked to this tumour (DDX3X, CTDNEP1, KDM6A, TBR), 
often in subgroup-specific patterns. RNA sequencing confirmed 
these alterations, and revealed the expression of what are, to our 
knowledge, the first medulloblastoma fusion genes identified. 
Chromatin modifiers were frequently altered across all subgroups. 
These findings enhance our understanding of the genomic 
complexity and heterogeneity underlying medulloblastoma, and 
provide several potential targets for new therapeutics, especially 
for Group 3 and 4 patients. 

As a first phase of the International Cancer Genome Consortium 
(ICGC) PedBrain Tumor Project (http://www.pedbraintumor.org), 
we have collected matched tumour and germline samples from 125 
medulloblastoma patients aged from 0 to 17 years (Supplementary 
Table 1). Whole-genome sequencing (WGS, n = 39) and whole-exome 
sequencing (WES, n = 21) were applied to a ‘discovery’ set, with a 
custom-capture approach used to sequence 2,734 genes in an additional 
‘replication’ set (n = 65). All tumour samples were obtained at primary 
diagnosis, before adjuvant therapy, and the distribution of molecular 
subgroups was similar across cohorts (Supplementary Fig. 1). 

Investigation of genome-wide somatic mutation allele frequencies 
identified several cases with a clear peak at approximately 25%, rather 
than the expected approximately 50% allele frequency for early, 
heterozygous events (Fig. 1a). Analysis of coverage depth and allele 
frequencies in regions of copy-number change ruled out stromal con- 
tamination, but rather indicated a tetraploid baseline in the tumour 
genome (Fig. 1b). Predicted ploidy status was confirmed by fluor- 
escence in situ hybridization (FISH) using multiple centromeric 
probes in 17 out of 18 cases analysed (Fig. la). The extremely low 
fraction of mutations at approximately 50% allele frequency indicates 
that genome duplication occurred very early during tumorigenesis. 


100 | NATURE | VOL 488 | 2 AUGUST 2012 


Some cases probably went through even higher polyploidy states 
before reaching an approximately 4n baseline (for example 
ICGC_MB45, displaying 4n chromosomes with 4:0 or 3:1 allele ratios; 
Supplementary Fig. 2). Across the discovery set, tetraploidy was most 
commonly observed in Group 3 (7 out of 13, 54%) and Group 4 
tumours (8 out of 20, 40%), followed by SHH (4 out of 14, 29%) and 
WNT tumours (1 out of 7, 14%). Interestingly, the four tetraploid SHH 
tumours all harboured TP53 mutations and also displayed chromo- 
thripsis®. Tetraploid Group 3 and 4 tumours showed significantly 
more large-scale copy number alterations compared with diploid cases 
(median 10 changes per tumour in tetraploid versus 4 per tumour in 
diploid cases, P = 0.008, two-tailed Mann-Whitney U-test; Supplemen- 
tary Fig. 3). Thus, tetraploidy followed by genomic instability may be 
an early driving event in a large proportion of Group 3 and 4 medullo- 
blastomas, which pose a significant clinical challenge due to their 
dismal prognosis and lack of targeted treatment options. Novel classes 
of drugs such as mitotic checkpoint kinase or kinesin inhibitors, which 
target the maintenance of tetraploidy through successive cell divisions, 
may therefore represent a rational therapeutic strategy in these 
cases’*. The value of tetraploidy as a prognostic marker also requires 
further investigation. 

The average somatic mutation rate in the WGS cohort was 0.52 per 
megabase (Mb), with an average of 10.3 non-synonymous coding 
single-nucleotide variants (SNVs) in the discovery cohort (Supplemen- 
tary Table 2). This is slightly higher than previously reported for 
medulloblastoma’, possibly due to improved coverage and technical 
sensitivity, but considerably lower than in deep-sequenced adult 
tumours, for example'*"’. There were significantly fewer transitions 
in the somatic alterations compared with germline variation 
(P = 4.6 X 10 7, Wilcoxon rank-sum test; Supplementary Fig. 4). All 
coding somatic SNVs identified in the combined cohort are listed in 
Supplementary Table 3. 

We identified a positive correlation between genome-wide mutation 
rate and patient age, as previously reported for coding mutations’ 
(7 = 0.35, P=7.8X10 > Pearson’s product-moment correlation; 
Fig. 1c). Intriguingly, this association was more pronounced in diploid 
tumours (r =0.52,P=3xX 10 °), and virtually absent in tetraploid 
cases (1° = 0.04, P = 0.5) (Supplementary Fig. 5a, b). A similar trend 
was observed for non-synonymous mutations across the discovery 
cohort (Supplementary Fig. 5c). Coverage level did not correlate with 
mutation rate (Supplementary Fig. 5d). One explanation may be that 
all medulloblastomas originate during embryogenesis, with some 
tumours needing to accumulate more genetic ‘hits’ before becoming 
symptomatic. Alternatively, tumours arising in older patients may 
derive from more differentiated cells that require a greater number 
of alterations to undergo malignant transformation. Investigation of 
additional tumours from older patients may help to clarify this. 

Five SHH tumours harbouring TP53 mutations, including three 
previously described Li-Fraumeni syndrome (LFS)-associated tumours 
with germline mutations®, one newly identified LFS case (ICGC_MB23), 
and one somatically mutated tumour (ICGC_MB34), had significantly 
more mutations than the remaining cases, both genome wide (mean 1.1 
per Mb versus 0.43 per Mb, P= 4.5 X 10°; two-tailed t-test) and for 
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Figure 1 | Tetraploidy is a frequent early event in medulloblastoma 
tumorigenesis, and mutation rates vary with age and subgroup. 

a, Distributions of genome-wide somatic mutation allele frequencies (the 
proportion of sequence reads supporting a mutation) for diploid tumours (with 
a peak at ~50% for heterozygous events, n = 7) and tetraploid cases (with a 
peak at ~25%, n = 7). Insets show centromeric FISH for chromosomes 1 (red) 
and 11 (green), confirming the predicted ploidy status. b, Top left, rescaled 
tumour:germline coverage ratio, indicating copy-number gains (red) or losses 
(green). Bottom left, B-allele frequency (BAF) in the tumour at SNP positions 


non-synonymous changes (mean 23 versus 8.8, P=2.6X10 °). 
Interestingly, the WNT subgroup, which typically shows a good pro- 
gnosis and few copy-number changes, had the next highest mutation 
rate (Fig. 1d). 

Forty-one somatic, coding, small insertions/deletions (Indels) were 
identified across the cohort, with an average of 0.4 coding Indels per 
case in the discovery set (range 0-2; Supplementary Table 4). Some 
genes, however, were more commonly affected by Indels than SNVs. 
For example, frameshift Indels in PTCH1 were detected in 6 out of 125 
cases, whereas only 2 SNVs were observed. Recurrent Indels were also 
seen in the chromatin modifiers MLL2, KDM6A (3 cases each) and 
BCOR (2 cases). 

In contrast to another paediatric brain tumour, glioblastoma, in 
which we recently identified frequently recurrent hotspot mutations’, 
the majority of mutated genes in this study were unique to a single case 
(587 out of 760 non-synonymous SNVs in the 125 cases, 77%), 
demonstrating the pronounced genetic heterogeneity of medulloblas- 
toma. Twenty-five of these singleton mutations, and 53 SNVs in total, 
were at positions listed in the COSMIC database of somatic alterations 
in tumours (available at http://www.sanger.ac.uk/genetics/CGP/ 
cosmic/), suggesting a rare but important contribution of many known 
cancer genes in medulloblastoma (Supplementary Table 5). Only 8 
genes were somatically altered in more than 3% of the whole series: 
CTNNB1 (15 cases, 12%); DDX3X (10 cases, 8%); PTCH1 (8 cases, 6%), 
SMARCA4 (6 cases, 5%), MLL2 (6 cases, 5%), TP53 (somatically 


which are heterozygous in the germ line. Right, genome alteration print (GAP) 
of segmented copy number and allele frequency profiles. Chromosomes with 
predicted 3:0/2:1/3:2 allele ratios show a BAF of approximately 0/0.33/0.4 and 
coverage ratios of approximately 0.75/0.75/1.25. Owing to random sampling, 
the 2:2 allele ratio is slightly below 0.5. c, Genome-wide somatic mutation rates 
are positively correlated with patient age (n = 39). Grp, Group. d, Distribution 
of somatic mutation rates by tumour subgroup (n = 39). P values are according 
to a Wilcoxon rank-sum test with Bonferroni correction. SHH-p53, SHH- 
subgroup tumours harbouring a somatic or germline TP53 mutation. 


mutated in 5 cases, 4%), KDMG6A (5 cases, 4%) and CTDNEP1 (4 cases, 
3%) (Fig. 2). These were also the only genes found to be significantly 
altered upon analysis of the combined cohort with MutSig, an algo- 
rithm testing whether the observed mutations in a gene are not simply 
a consequence of random background mutation processes. It takes 
into account gene length and composition, silent to non-silent muta- 
tion ratios, and other factors (see https://confluence.broadinstitute. 
org/display/CGATools/MutSig; Supplementary Table 6). Large-scale 
copy-number changes known to be associated with medulloblastoma, 
such as formation of an isodicentric 17q and losses of 10q/9q/X"*"**, 
were more frequently recurrent than SNVs (Supplementary Fig. 6a-e). 

Many alterations were enriched in specific medulloblastoma sub- 
groups. For example, all of the WNT tumours (15 out of 15) harboured 
a mutation in CTNNB1, and 13 out of 15 displayed loss of one copy of 
chromosome 6 (or acquired uniparental disomy in one case), altera- 
tions which have previously been associated with this subgroup**». 
Mutations in DDX3X were also clearly enriched in WNT tumours 
(adjusted P= 7.06 X 10 °, two-tailed Fisher’s exact test with a 
Bonferroni correction), and these mutations were clustered within 
the helicase domain (Supplementary Fig. 7a). Three were localized at 
the RNA-binding surface of the protein and three were predicted to 
disrupt the closed (RNA-binding) conformation (Supplementary 
Fig. 7b). The remainder were predicted to disrupt indirectly either 
the positive charge on the RNA-binding surface (n = 2) or the folding 
of the closed form (n = 2). No truncating mutations were found, 
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Figure 2 | Subgroup specificity of common genetic alterations. Summary of 
clinical data and recurrent alterations in the combined cohort (n = 125). Genes 
which were found to be significantly mutated by MutSig analysis were included. 
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indicating an alteration rather than simply a loss of function. DDX3X 
has recently been proposed to have an oncogenic role’*"’, although its 
exact function in tumorigenesis remains to be determined. 

As anticipated from previous studies'*’°, SHH tumours frequently 
showed loss of the whole of chromosome arm 9q, as well as alterations 
in key hedgehog-pathway signalling molecules (for example, PTCH1, 
altered in 8 cases; MYCN, amplified in 5 cases; and SMO, mutated in 
ICGC_MB12). 

The most frequently mutated gene in Group 3 tumours was 
SMARCA4 (3 out of 26 cases). As with DDX3X, these mutations were 
clustered in the helicase domain (Supplementary Fig. 7a). As noted 
above, tetraploidy was also a common event in this subgroup and in 
Group 4 tumours. Recurrent truncating mutations in KDM6A (on 
chromosome X, which frequently shows copy-number loss in female 
Group 3 and 4 medulloblastoma patients; also known as UTX), encod- 
ing a histone 3 lysine 27 (H3K27) demethylase, were also seen in 
Group 4 (4 out of 40, 10%), indicating a tumour-suppressive role in 
this subgroup, as previously described for other cancers'’. CTDNEP1 
(a homologue of the Xenopus gene dullard), was also affected by trun- 
cating alterations in four tumours. In three of these cases, the mutation 
was accompanied by loss of the wild-type allele through isodicentric 
17q formation. This gene, encoding a nuclear envelope phosphatase, 
was shown in Xenopus to have roles in BMP signalling and neural 
development'®. In mammalian cells it is involved in the lipin activation 
pathway, regulating nuclear membrane biogenesis and production of 
diacylglycerol”””*. Given the high frequency of isodicentric 17q in 
medulloblastoma, genetic targets on this chromosome have long been 
sought after. CTDNEP1 may be a good candidate for one of the medul- 
loblastoma tumour suppressors on 17p. 

Aside from these subgroup-enriched events, a commonly recurring 
theme across all medulloblastomas is alterations in genes involved in 
chromatin modification. Some point mutations and DNA copy num- 
ber alterations in this pathway have previously been implicated in 
medulloblastoma’”’. Overall, 45 out of 125 cases (36%) harboured a 
mutation in a gene categorized under the Gene Ontology term 
“Chromatin Modification’ (GO:0015168, Supplementary Fig. 6f, g). 

We recently described an enrichment of catastrophic DNA rearran- 
gements (‘chromothripsis’) in TP53-mutated SHH medulloblasto- 
mas®. Three new TP53-mutant SHH tumours were identified in this 


Figure 3 | Identification of novel fusion genes in medulloblastoma. a, Read- 
depth plot with log, tumour:germline coverage ratio showing alterations on 
chromosome 7 in ICGC_MB34. Lines indicate connected segments. b, Schematic 
of the rearrangement. c, Details of the SHH fusion gene structure and support for 
its expression, derived from RNA sequencing data. aa, amino acids. 
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study: ICGC_MB23 (germline mutation), MBRep_T29 and 
MBRep_T53 (somatic mutations). Two of these, ICGC_MB23 and 
MBRep_T53, showed complex genomic rearrangements indicative 
of the chromothripsis model (Supplementary Fig. 8)”. 

Deep sequencing also allowed fine mapping of two amplicons on 
chromosome 7 in ICGC_MB34 (a SHH tumour with a somatic TP53 
mutation, relating to MB2034 in ref. 6). One amplicon included the 
entire SHH gene, whereas the second disrupted DNAJB6, such that its 
first exon was juxtaposed to SHH (Fig. 3a, b). RNA sequencing further 
revealed a novel fusion transcript, not expected from the DNA data, 
containing the first exon of DNAJB6 and exons 2 and 3 of SHH. The 
first exon of SHH was skipped, resulting in a predicted amino-terminally 
truncated SHH protein (Fig. 3c). Expression of SHH was extremely high 
in this case, although virtually absent in 301 other medulloblastomas 
(Supplementary Fig. 9a). Predicted DNA and RNA junctions were 
validated by PCR (Supplementary Fig. 9b). 

Several additional in-frame gene fusions were identified by large 
insert mate-pair sequencing, which gives better resolution for struc- 
tural variant detection. ICGC_MB18, for example, carried an intra- 
chromosomal translocation resulting in a fusion between LCLATI and 
ERBB4, the latter of which has previously been associated with medul- 
loblastoma oncogenesis”* (Supplementary Fig. 9c—f). In ICGC_MB6, a 
complex rearrangement of fragments from chromosomes 1 and 17 
produced a fusion between MLLT6 and MRPL45, a mitochondrial 
ribosomal protein, resulting in strong overexpression of the latter 
(Supplementary Fig. 10a—c). These findings indicate that gene fusions 
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involving well-established medulloblastoma oncogenes may have a 
more important role in medulloblastoma than previously recognized, 
and warrant further investigation. 

High-coverage, strand-specific RNA sequencing of 28 cases allowed 
us to determine the proportion of DNA SNVs that were observable in 
the transcriptome (Supplementary Tables 3 and 4). Overall, 129 out of 
268 (48%) non-synonymous mutations in the DNA were also detect- 
able at the RNA level. A further 38% (101 out of 268) resided in genes 
expressed at extremely low abundance (reads per kilobase of exon 
model per million mapped reads (RPKM) < 1). Thus, the fraction of 
expressed mutations is even smaller than the already low number of 
DNA alterations, supporting the hypothesis that very few driving hits 
are needed to generate this paediatric tumour. It may also be the case 
that some mutations required for tumour initiation are not essential 
for later tumour cell maintenance. 

RNA sequencing further revealed monoallelic expression of a 
heterozygous mutation in TBR1, producing a p.G275C change, which 
was also seen in a previous study? (Supplementary Fig. lla). TBR1 
encodes a T-box transcription factor involved in brain development™*. 
This gene, and a second family member, EOMES (or TBR2), clearly 
showed subgroup-specific differential expression (Fig. 4a). Sequencing 
of TBR1 exon 2 in a further 85 medulloblastomas revealed one addi- 
tional case with an identical mutation. All three mutated tumours were 
in Group 4. Gene expression was also strongly correlated with DNA 
methylation for both TBR1 and EOMES (Fig. 4b, c and Supplementary 
Fig. 11b, c), and expression of TBR1 and EOMES is inversely correlated 
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Figure 4 | Integration of mutation, expression and methylation data shows 
differential regulation of TBR1 and EOMES in medulloblastoma. 

a, Microarray data showing clear differences in TBR1 and EOMES expression 
between medulloblastoma subgroups (n = 301). b, DNA methylation of TBR1 
(n = 54), ranging from low (blue) to high (red). Horizontal red bar indicates the 
region used for correlation analysis in c. c, Expression of TBR1 is tightly 


Samples 


correlated with gene methylation (n = 54; Pearson’s correlation values, r). SHH 
tumours show high methylation and virtually no expression, whereas WNT, 
Group 3 and Group 4 tumours display a more varied pattern. d, Expression 
levels of TBRI (diamonds) and EOMES (circles) are inversely related in Group 
4 tumours (n = 104). 
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in Group 4 tumours (Fig. 4d), giving subsets that are either TBRI- 
methylated and EOMES™ or EOMES-methylated and TBR1™ 
(Supplementary Fig. 11d, e). These two genes are markers for different 
stages of neuronal lineage commitment, suggesting possible differences 
in cell-of-origin or differentiation within Group 4 subpopulations”. 
This large, integrative genomics study has provided a detailed insight 
into new mechanisms contributing to medulloblastoma tumorigenesis 
and disclose novel targets for therapeutic approaches, especially for 
Group 3 and 4 patients. The molecular subgroup-related enrichment 
of many alterations highlights the importance of considering this dis- 
tinguishing factor in research, trial design and clinical practice. 


METHODS SUMMARY 


All patient material was collected after receiving informed consent according to 
ICGC guidelines and as approved by the institutional review board of contributing 
centres. Tumour subgrouping was based on gene expression profiling or immuno- 
histochemical analysis as described in ref. 5. 

Next generation sequencing was performed using Illumina technologies. Mean 
DNA sequence coverage was 35-fold for whole-genome cases (range 26-56X), 
whereas mean on-target coverage in the whole-exome and replication cohorts was 
68-fold (74% of targets above 20 for whole exome, 66% for the replication 
cohort). Exome capture was carried out with Agilent SureSelect (Human All 
Exon 50 Mb and XT Custom Library) in-solution reagents. Sequence data were 
aligned to the hg19 human reference genome assembly; duplicate and non- 
uniquely mapping reads were excluded. Tumour ploidy was predicted from 
sequencing data by a novel approach integrating copy number aberrations with 
allele frequencies. A subset of sequence variants were validated using PCR and 
Sanger sequencing. Verification rates were 95% (128 out of 135) for SNVs and 
100% (14 out of 14) for Indels (Supplementary Tables 3 and 4). A complete 
description of the materials and methods is provided in the Supplementary 
Information. 
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Targeting nuclear RNA for in vivo correction of 


myotonic dystrophy 


Thurman M. Wheeler’, Andrew J. Leger*, Sanjay K. Pandey*, A. Robert MacLeod’, Masayuki Nakamori'*, Seng H. Cheng”, 
Bruce M. Wentworth’, C. Frank Bennett* & Charles A. Thornton!? 


Antisense oligonucleotides (ASOs) hold promise for gene-specific 
knockdown in diseases that involve RNA or protein gain-of- 
function effects. In the hereditary degenerative disease myotonic 
dystrophy type 1 (DM1), transcripts from the mutant allele contain 
an expanded CUG repeat’? and are retained in the nucleus**. The 
mutant RNA exerts a toxic gain-of-function effect®, making it 
an appropriate target for therapeutic ASOs. However, despite 
improvements in ASO chemistry and design, systemic use of 
ASOs is limited because uptake in many tissues, including skeletal 
and cardiac muscle, is not sufficient to silence target messenger 
RNAs’*. Here we show that nuclear-retained transcripts contain- 
ing expanded CUG (CUG*?) repeats are unusually sensitive to 
antisense silencing. In a transgenic mouse model of DM1, systemic 
administration of ASOs caused a rapid knockdown of CUG“? 
RNA in skeletal muscle, correcting the physiological, histopatho- 
logic and transcriptomic features of the disease. The effect was 
sustained for up to 1 year after treatment was discontinued. 
Systemically administered ASOs were also effective for muscle 
knockdown of Malat1, a long non-coding RNA (IncRNA) that is 
retained in the nucleus’. These results provide a general strategy to 
correct RNA gain-of-function effects and to modulate the expres- 
sion of expanded repeats, IncRNAs and other transcripts with pro- 
longed nuclear residence. 

Antisense silencing by the RNase H-dependent mechanism entails a 
three-step process of oligonucleotide hybridization to its cognate 
RNA, cleavage of the target by RNase H1 and exonuclease degradation 
of the cleavage fragments. The relative efficiency of this mechanism in 
the nucleus and cytoplasm is uncertain. Observations that ASOs 
shuttle from cytoplasm to nucleus’? and that targeting intronic 
sequences'' or nuclear RNA” can silence gene expression indicate 
that antisense is active in the nucleus. However, activity in the 
cytoplasm is less clear. Although RNase H1 is not restricted to the 
nucleus!*, recent studies indicate that the non-nuclear fraction is con- 
fined to mitochondria’. This suggests that ASOeRNase H cleavage is 
mainly a nuclear process, whose potency could be maximized by 
targeting transcripts with long nuclear residence. 

To test this idea, we used a transgenic mouse model of DM1. HSA'® 
transgenic mice express CUG"? RNA at high levels in skeletal muscle. 
Human DM1 is caused by an expanded CTG repeat in the 3’ untrans- 
lated region (UTR) of dystrophia myotonica-protein kinase (DMPK)’, 
whereas in HSA™® mice the expanded repeat is in the 3’ UTR of a 
human skeletal actin (hACTA1) transgene’. In both conditions the 
CUG** transcripts are retained in nuclear foci, along with splicing 
factors in the muscleblind-like (MBNL) protein family. Muscleblind 
sequestration leads to misregulated alternative splicing and other 
changes of the muscle transcriptome’*'’. The RNA toxicity was 
mitigated in mice by CAG-repeat morpholino oligomers that compete 
with MBNL proteins for CUG™? binding, without activating RNase H. 
However, this approach required direct injection into a single muscle, 
followed by in vivo electroporation, a method to load muscle fibres 


with oligomers’®. As an alternative, RNase H-active ASOs could pro- 
duce widespread correction, provided that uptake of circulating ASOs 
was sufficient to induce target cleavage. 

We identified ASOs showing a strong knockdown of hACTA1 in 
tissue culture, good tolerability when systemically administered in 
wild-type mice, and activity against hACTA1-CUG*? transcripts 
in vivo when electroporated into muscle (Supplementary Figs 1-3). 
The ASOs had 2’-O-methoxyethyl (MOE) modifications at both 
ends to maximize biostability, and a central gap of 10 unmodified 
nucleotides to support RNase H activity (MOE gapmers; Supplemen- 
tary Table 1). We tested three of the ASOs in HSA™ transgenic mice by 
subcutaneous injection of 25 mg kg | twice weekly (Fig. 1a). After 4 
weeks of administration (8 injections), ASO 445236 reduced the level 
of CUG“? RNA in hindlimb muscles by more than 80% (Fig. 1b). 
Another ASO targeting the 3’ UTR, downstream of the repeat tract, 
also showed strong CUG™? reduction, whereas an ASO targeting the 
5’ UTR, or three oligonucleotides against other targets, had no effect 
(Fig. 1b, c). 

RNase H cleavage of mRNA is usually followed by rapid decay of 
cleavage fragments. However, stable cleavage fragments are observed 
occasionally’’, and the CUG™? tract forms extensive hairpins”® and 
ribonucleoprotein complexes” that could inhibit exonuclease activity. 
The failure of antisense targeting in the 5’ UTR also raised the 
possibility that cleavage downstream of the repeat tract was required 
for efficient silencing. We therefore tested an additional ASO, 190401, 
targeting the hACTAI coding region, and found that it also was highly 
effective (Fig. 1d). Furthermore, northern blot analysis using a CAG- 
repeat probe showed no evidence for a stable CUG™? cleavage frag- 
ment (Fig. le), and in situ hybridization showed reduction of nuclear 
CUG? foci (Supplementary Fig. 4). These results indicate that 
expanded CUG repeats are degraded after a cleavage event 5’ or 3’ 
of the repeat tract. 

Reduction of CUG*? RNA would be expected to release sequestered 
MBNLI protein and improve its splicing regulatory activity. 
Consistent with this prediction, alternative splicing of four MBNL1- 
dependent exons, Sercal (also known as Atp2a1) exon 22, titin (Ttn) 
exon 362, Zasp (also known as Ldb3) exon 11, and Clen1 chloride ion 
channel exon 7a, was normalized (Fig. 1f, g and Supplementary Figs 5 
and 6)'°. The Clcn1 splicing defect causes loss of channel function, 
repetitive action potentials and delayed muscle relaxation (myotonia)”, 
a cardinal feature of the disease. Blind analysis showed that myotonic 
discharges in hindlimb muscles were eliminated by the active ASOs 
(Fig. 1h), indicating rescue of Clen1 function. 

In addition to splicing defects, expression of CUG*? RNA or ablation 
of Mbnl1 causes extensive remodelling of the muscle transcriptome'*’””’. 
We used microarrays to examine transcriptomic effects of ASOs. 
Principle component analysis showed that gene expression in ASO- 
treated HSA™ mice was shifted towards wild-type mice, indicating 
an overall trend for transcriptome normalization (Fig. 2a). Among 
transcripts that were up- or downregulated in HSA muscle, more 
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Figure 1 | Systemic administration of 2’-O-(2-methoxyethyl) ASOs in the 
HSA‘ transgenic mouse model of DM1. a, Location of ASO-targeting 
sequences relative to the hACTA1 coding region and the expanded CUG repeat 
in the 3’ UTR. b, Quantitative real-time RT-PCR of hACTA1-CUG™? mRNA 
in quadriceps (Quad), gastrocnemius (Gastroc) and tibialis anterior (TA) 
muscle in HSA™ mice treated with the indicated ASOs by subcutaneous 
injection of 25 mgkg | twice weekly for 4 weeks. Muscle tissue was obtained 
1 week after the final dose (n = 4 per group). The mean levels of transgene 
mRNA + s.d. are shown. **P < 0.001, ***P < 0.0001 (one-way analysis of 
variance (ANOVA)). ¢, hACTA1I-CUG*® transcript levels in quadriceps are 
not affected by ASOs targeting unrelated transcripts (141923, randomer; 
116847, Pten; 399462, Malatl; n = 4 per group; same dose as in b). Error bars 
are mean + s.d. d, Knockdown of hACTAI-CUG*? mRNA in muscle by ASO 


than 85% were normalized or partially corrected by ASOs, without 
evidence of off-target effects (Fig. 2b, and Supplementary Fig. 7 and 
Supplementary Table 2). These results confirm that ASOs caused an 
overall improvement of the muscle transcriptome. 

ASO effects were evident within 2 weeks (Supplementary Fig. 8) and 
were dose-dependent. A threefold dose reduction (8.5 mg kg? 
biweekly for 4 weeks) caused partial myotonia and splicing correction, 
whereas a tenfold dose reduction (2.5 mg kg ') caused partial 
myotonia correction in tibialis anterior but not in quadriceps 
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190401 (n = 4 per group; same dose as in b). Error bars are mean + s.d. 
***P = 0.0005 (t-test). e, Northern analysis of RNA from quadriceps muscle. 
CUG"? RNA was detected using a (CAG)g oligonucleotide probe. Mouse actin 
serves as loading control. f, g, RT-PCR analysis of alternative splicing of Clen1 
(f) and Sercal (g) transcripts. For Clen1, only the -ex7a isoform encodes a 
functional ion channel. -ex7a, exon 7a exclusion; +ex7a, exon 7a inclusion; - 
ex22, exon 22 exclusion; +ex22, exon 22 inclusion; neg, negative control mice 
injected with GAC25 morpholino; pos, positive control mice injected with 
CAG25 morpholino; WT, FVB/N wild-type mice. h, Blind analysis of myotonia 
using EMG, 1 week after the final dose (n = 4 mice per group). Error bars are 
mean + s.d. ***P < 0.0001 for ASO-treated versus saline-treated muscles 
(two-way ANOVA). 


(Supplementary Fig. 9a—c), the latter muscle having higher basal levels 
of CUG™® expression'®. Serum chemistries showed no evidence for 
renal or liver toxicity (Supplementary Fig. 10). 

A uniform finding in previous studies of MOE gapmer ASOs was 
that systemic administration failed to cause significant target reduc- 
tion in muscle, despite efficient knockdown in liver (n = 12 different 
mRNA targets; Supplementary Table 3), raising the possibility that 
muscle tissue in our model is unusually susceptible to antisense 
silencing. We examined the functional integrity of the muscle 
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caused HSA‘ transgenic mice to cluster nearer to wild-type mice 
(25mgkg ' biweekly for 4 weeks; n = 4 mice per group). b, Of the transcripts 
upregulated in HSA" versus wild-type mice (saline), >85% showed 
complete or partial return to normal expression after treatment with ASOs 
(n = 4 mice per group). 


membrane, a physiological barrier to ASO uptake’, and found that 
muscle penetration of the extracellular dye, Evans Blue, was similar in 
HSA‘ and wild-type mice (Supplementary Fig. 11a). Direct analysis of 
muscle tissue indicated that ASO accumulation was no greater in 
HSA‘ mice than in wild-type controls (Supplementary Fig. 11, c). 
Likewise, the mRNA level for RNase H1 was similar in HSA" and 
wild-type muscle (Supplementary Fig. 12). We tested ASOs targeting 
other muscle-expressed transcripts. ASOs for Pten phosphatase or 
Srb1 (also known as Scarb1) scavenger receptor showed efficient target 
knockdown in liver, but no appreciable knockdown in HSA™ or wild- 
type muscle (Fig. 3a). Taken together with previous studies, our results 
indicate specific sensitivity of hACTA1-CUG** transcripts rather than 
a general enhancement of ASO activity in HSA’® muscle. 

A notable metabolic feature of hACTA1-CUG*? and human 
DMPK-CUG*? mRNA is that processing and polyadenylation are 
normal but the transcripts are retained in the nucleus”®. Recent studies 
have shown that RNase H1, the enzyme responsible for antisense 
knockdown, is localized to the nucleus and mitochondria”, suggesting 
that antisense cleavage of nuclear-encoded RNA occurs before nuclear 
export, and raising the possibility that nuclear-retained transcripts 
may exhibit enhanced sensitivity. To determine whether other 
nuclear-retained transcripts show a similar effect we examined 
metastasis associated lung adenocarcinoma transcript 1 (Malat1), an 
endogenous nuclear IncRNA’. We identified MOE gapmer ASOs that 
produced strong Malat1 knockdown in cells, inan RNase H1-dependent 
manner (Supplementary Fig. 13). In wild-type and HSA‘ mice, 
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Figure 3 | Differential sensitivity of transcripts to ASO knockdown in 
skeletal muscle. a, In HSA“ or FVB/N wild-type mice, ASOs targeting Srb1 
(353382) or Pten (116847) were effective for knockdown in liver but not in 
quadriceps muscle (qRT-PCR, n = 4 per group). Error bars are mean = s.d. 
*P = 0.02, ***P < 0.0001 (t-test). b, HSA*® and FVB/N wild-type mice were 
treated with ASO 399462 targeting Malat1, a nuclear-retained IncRNA. Levels 
of Malat1 transcript in the indicated tissues were determined by (RT-PCR 
(n = 4 ASO, 3 saline). Error bars are mean + s.d. *P = 0.035, **P < 0.007, *** 
P= 0.001 for ASO versus saline (t-test). c, Dose response of Malat1 knockdown 
in BALB/c wild-type mice. BALB/c wild-type mice were treated with saline or 
ASO 399462 targeting Malat! at 12.5, 25 and 50 mgkg ‘ twice per week for 
3.5 weeks (7 doses in total; n = 4 per group). Tissues were collected for RNA 
isolation 2 days after the final dose. Malat1 transcript levels were determined by 
qRT-PCR. Error bars are mean + s.e.m. *P < 0.01, **P < 0.001, 

* P< 0.0001 (two-way ANOVA). 


subcutaneous administration of ASOs for 4 weeks caused a greater 
than 80% reduction in Malatl1 in muscle (Fig. 3b, c), supporting the 
idea that nuclear-retained transcripts have enhanced sensitivity. 

To determine the duration of ASO action in muscle, we examined 
mice at 15 and 31 weeks after ASO was discontinued, and found that 
hACTA1-CUG*? knockdown and splicing correction remained 
strong (not shown). One year after ASO injection was discontinued, 
target reduction by ASO 190401 had waned, but remained approxi- 
mately 50% or more for ASO 445236 (Fig. 4a). Even at this late time 
point the appropriate cleavage products were detected by amplifica- 
tion of complementary DNA 5’ ends (5' RACE), indicating persistent 
ASO-RNase H1 activity (Fig. 4b). Consistent with the extent of target 
reduction, there was partial return of myotonia and splicing defects for 
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Figure 4 | Duration of ASO activity and in vivo targeting of human DMPK. 
a-e, Two-month-old HSA" mice received saline or ASO by subcutaneous 
injection of 25 mgkg | twice weekly for 4 weeks (n = 5 for each ASO, n = 6 for 
saline), with tissues isolated 1 year after the final dose. (RT-PCR analysis of 
HSA‘ transgene mRNA (mean ~ s.d.) was normalized to the housekeeping 
gene Gtf2b mRNA (a). Results were similar when normalized to total RNA 
input. 5’ RACE was carried out on muscle RNA obtained 1 week or 1 year after 
discontinuation of ASO 190401 or 445236 treatment (b). PCR products 
(5’ RACE fragment) migrated at the expected position for ASOeRNase H 
cleavage products, and were confirmed by DNA sequencing. Quantification of 
Sercal splicing (c) and myotonia (d) showed a partial return of splicing defects 
and myotonia for ASO 190401 but not ASO 445236. Myotonia was graded 
blind by the examiner (d). After prolonged knockdown of toxic RNA, the 
number of internal nuclei per muscle fibre was determined by histologic 
analysis when mice were aged 14 months (e) (n = 4 for ASO 445236; n = 3 for 
saline; WT, untreated 3-month-old FVB/N wild-type control mice). 
f, DM328XL mice received subcutaneous injections of saline or ASO 445569 
targeting the 3’ UTR of hDMPK. The ASO dose was 50 or 75 mgkg * twice 
weekly for 4 weeks (n = 5, low dose; n = 4, high dose; n = 2, saline). Tissues 
were isolated 2 days after the final dose. Dose-dependent reduction of hDMPK, 
normalized to housekeeping gene Gtf2b mRNA. Note that hDMPK mRNA was 
undetectable in wild-type mice. No RT, no reverse transcriptase. WT, untreated 
wild-type littermates of DM328XL transgenic mice (m = 2). Error bars are 
mean + s.d. *P < 0.05, **P < 0.001, ***P < 0.0001 for ASO-treated versus 
saline-treated muscle (two-way ANOVA). 


ASO 190401, whereas correction by ASO 445236 remained strong 
(Fig. 4c, d and Supplementary Fig. 14a—e). Furthermore, the persistent 
knockdown of CUG"? RNA largely prevented the age-dependent 
myopathic changes in HSA‘® muscle, as evidenced by reduced fre- 
quency of central nuclei (Fig. 4e) and improved muscle-fibre diameter 
(mainly a prevention of fibre atrophy) (Supplementary Fig. 15). These 
findings indicate that ASO activity against hACTA1-CUG in muscle 
is remarkably durable and that long-term reduction of the toxic RNA 
can protect against structural changes in muscle fibres. Notably, 
the duration of Malatl knockdown in muscle was also prolonged 
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(a greater than 50% reduction at 31 weeks after ASO discontinuation) 
and more persistent than in liver (Supplementary Fig. 16). 

Therapeutic application of this strategy to human DM1 will require 
transfer of the targeting sequence to hDMPK. We developed MOE 
gapmer ASOs that were active against hDMPK in cells. We examined 
in vivo activity after 4 weeks of twice weekly subcutaneous injection in 
transgenic mice that express hDMPK with 800 CUG repeats. The ASO 
produced significant knockdown of hDMPK-CUG*? transcripts in 
hindlimb muscle (Fig. 4f and Supplementary Figs 17 and 18), support- 
ing the feasibility of silencing the pathogenic DMPK allele. 

Despite physiological barriers to tissue uptake, our results indicate 
that systemic targeting of CUG"? RNA is feasible because small 
amounts of ASOs that enter muscle fibres can hybridize their target 
and productively engage RNase H1. Although the mechanisms for 
enhanced sensitivity of CUG*? RNA and Malat1 are not fully defined, 
our data suggest that residence time in the nucleus is an important 
determinant of transcript sensitivity. Features of the nuclear environ- 
ment that may enhance antisense activity include nuclear localization 
of RNase H1 (ref. 14) and auxiliary proteins that promote oligonucleo- 
tide hybridization*’, and—in the case of CUG™? transcripts—spatial 
concentration of targets in a small volume’. A similar approach may be 
effective for other genetic disorders that have nuclear accumulation of 
repeat expansion RNA**”’. Previous studies have used CAG-repeat 
ASOs that bind CUG“? RNA without activating RNase H, in an effort 
to block the protein interactions or modify the metabolism of the toxic 
RNA’**. Although this approach was effective with local delivery, 
initial attempts at systemic delivery were less successful (T.M.W. 
and C.A.T., unpublished observations), which fits with the expectation 
that higher tissue concentrations of ASO are required to occupy 
CUG*? binding sites than to induce RNase H _ cleavage. 
Furthermore, the RNase H mechanism is attractive because it exploits 
the nuclear retention phenomenon to gain a therapeutic advantage, 
while posing less risk of off-target effects by avoiding a repetitive 
sequence. Recently, local delivery of RNase H-active CAG-repeat 
ASOs induced partial CUG“? knockdown, but was accompanied by 
muscle damage”’, again suggesting that direct targeting of the repeat 
tract may have pitfalls. Our results also suggest that ASOs are useful for 
in vivo functional characterization and therapeutic modulation of 
IncRNAs, a large and recently recognized class of regulatory RNAs”. 


METHODS SUMMARY 

Experimental mice. All animal experiments were approved by the Institutional 
Animal Care and Use Committees at the University of Rochester, Genzyme 
Corporation and Isis Pharmaceuticals. 

Subcutaneous injection of ASOs. MOE gapmer ASOs were dissolved in saline 
and administered by subcutaneous injection in the interscapular region twice per 
week at the indicated doses. 

Quantitative real-time RT-PCR (polymerase chain reaction with reverse tran- 
scription) assay. Total RNA was purified from muscle using RNeasy Lipid Tissue 
Mini Kits (Qiagen). mRNA levels for ACTA1, Srb1, Pten, Malat1 and RNase H1 
were determined on the Applied Biosystems 7500 System using 18S rRNA as 
normalization control. General transcription factor 2b (G#f2b) and total RNA 
(Ribogreen assay) served as normalization controls for human DMPK and mouse 
Dmpk. 

Northern analysis. CUG™? sequences were detected using a *’P end-labelled 
(CAG), DNA oligonucleotide probe. 

Electromyography. Electromyography (EMG) was carried out blind under gen- 
eral anaesthesia, as described previously"’. 

RT-PCR analysis of alternative splicing. RT-PCR was carried out using the 
SuperScript II] One-Step RT-PCR System with Platinum Taq DNA Polymerase 
(Invitrogen) and the same gene-specific primers for cDNA synthesis and PCR 
amplification. PCR products were separated on agarose gels, stained with 
SybrGreen I Nucleic Acid Gel Stain (Invitrogen) and scanned with a fluorimager. 
Transcriptome analysis. Quadriceps-muscle RNA from wild-type or HSA‘® 
transgenic mice treated with vehicle (saline), ASO 445236 or ASO 190401 was 
processed to cRNA and hybridized on microbeads using MouseRef-8 v2.0 
Expression BeadChip Kits (Illumina). Image data were quantified using 
BeadStudio software (Illumina). 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Antisense oligonucleotides. ASOs were synthesized at Isis Pharmaceuticals, as 
described previously*’. All ASOs were MOE gapmer 20mers with phosphorothioate 
as the intersubunit linkage, 2'-O-(2-methoxyethyl) (MOE) modifications of 5 
nucleotides at the 5’ and 3’ end, and a central gap of 10 deoxynucleotides. The 
sequence of each ASO is listed in Supplementary Table 1. CAG25 and GAC25 
morpholinos’* were purchased (Gene Tools). 

Identification of active ASOs. The criteria for identifying active hACTAI- 
targeting ASOs were as follows: first, selection of targeting sequences that were 
not conserved in mice, to avoid knockdown of endogenous skeletal actin; second, 
>50% hACTA1 knockdown when electroporated in HepG2 cells (Supplementary 
Fig. 1); and third, absence of hepatotoxic or immunostimulatory effects in wild- 
type mice, when 50 mg kg” ' was injected subcutaneously twice weekly for 4 weeks 
(Supplementary Fig. 2a—-c). Out of 11 candidate ASOs examined, 5 satisfied these 
criteria. For the ASO with the highest activity in HepG2 cells, we also verified 
activity against hACTA1I-CUG*® transcripts in vivo, by direct injection and elec- 
troporation of tibialis anterior muscle in HSA‘ mice (Supplementary Fig. 3). Four 
of the five ASOs were subsequently used for subcutaneous administration in 
HSA‘ mice. ASOs targeting Malat1 were identified by demonstration of >50% 
target knockdown when electroporated in mouse hepatocellular SV40 large 
T-antigen carcinoma (MHT) cells, and absence of hepatotoxic or immunostimu- 
latory effects in wild-type mice (dosing as above). 

Cell transfection and gene analysis. HepG2 cells were electroporated in a 96-well 
plate format at 165V with ASOs in complete media containing MEM, non- 
essential amino acid (NEAA), sodium pyruvate and 10% FBS at room temper- 
ature. Cells were incubated overnight and lysed in RLT buffer (Qiagen). Total 
RNA was prepared using Qiagen RNeasy kit. Quantitative real time RT-PCR 
(qRT-PCR) was performed using the Qiagen QuantiTect Probe RT-PCR kit. 
Twenty-microlitre qRT-PCR reactions were run in duplicate and normalized 
against total RNA, calculated using the Ribogreen assay (Invitrogen). 
Experimental mice. Institutional Animal Care and Use Committees at the 
University of Rochester, Genzyme Corporation and Isis Pharmaceuticals 
approved all animal experiments. HSA‘ mice in the line 20b were derived and 
maintained on the FVB/N background strain®. The (CTG) 59 tract in this line is 
unstable, and has shortened to (CTG)229. DM328XL mice carry a 45-kb human 
genomic fragment that includes the mutant DMPK gene with 800 CTG repeats**”’. 
The DM328XL mice were hemizygous and display no histologic changes, 
myotonia or splicing defects in skeletal muscle***’. FVB/N, BALB/c, C57BI/10 
and Mdx mice were from Jackson Laboratories. 

Muscle injection of ASOs. The tibialis anterior muscle was injected with 0.2, 0.4 
or 0.8 nmol ASO in 20 pl saline, and the contralateral tibialis anterior with 20 ul 
saline alone, they were then electroporated, as described previously**. Treatment 
assignments were randomized and injections were carried out blind. 
Subcutaneous injection of ASOs. All ASOs were dissolved in phosphate buffered 
saline (PBS). Doses of 2.5, 8.5, 12.5, 25 or 50 mg kg! were injected subcutaneously, 
twice per week in the interscapular region for 3.5 to 4 weeks (7 or 8 doses in total). 
Injection volumes ranged from 140 to 200 pl. 

Real-time PCR Assay. Total RNA was purified from tibialis anterior, gastrocnemius 
or quadriceps muscle using the RNeasy Lipid Tissue Mini Kit (Qiagen) according to 
the manufacturer’s instructions. RT-PCR was used to determine mRNA levels for 
ACTAI, Srb1, Pten, Malat1 and RNase H1, with 18S rRNA as normalization control, 
on an Applied Biosystems 7500 Real-Time PCR System. Gtf2b and total RNA 
(Ribogreen assay) served as normalization controls for human DMPK and mouse 
Dmpk. 

Real-time PCR assay primer probe set sequences. ACTA primer probe set 1 
(PPset 1): forward, 5’-GTAGCTACCCGCCCAGAAACT-3’; reverse, 5’-CCA 
GGCCGGAGCCATT-3’; probe, 5'-ACCACCGCCCTCGTGTGCG-3’. ACTA1 
PPset 2: forward, 5’-GACGAGGCTCAGAGCAAGAGA-3’; reverse, 5’-TGATG 
ATGCCGTGCTCGATA-3'; probe, 5’-CCTGACCCTGAAGTAC-3’.  Srb1: 
forward, 5’-TGACAACGACACCGTGTCCT-3’; reverse, 5’-ATGCGACTTGTC 
AGGCTGG-3'; probe, 5’-CGTGGAGAACCGCAGCCTCCATT-3’. _ Pten: 
forward, 5’-ATGACAATCATGTTGCAGCAATTC-3’; reverse: 5’-CGATGCA 
ATAAATATGCACAAATCA-3’; probe, 5’-CTGTAAAGCTGGAAAGGGACG 
GACTGGT-3’. Malat1: forward, 5'-TGGGTTAGAGAAGGCGTGTACTG-3’; 
reverse, 5'-TCAGCGGCAACTGGGAAA-3’; probe, 5’-CGTTGGCACGACAC 
CTTCAGGGACT-3’. RNase H1: forward, 5'-ACTCAGGATTTGTGGGCAA 
TG-3'; reverse, 5'-CCTCAGACTGCTTCGCTCCTT-3’; probe, 5’-AGAGGC 
CGACAGACTGGCACGG-3’. Human DMPK: forward, 5'-AGCCTGAGCC 
GGGAGATG-3';_ reverse, 5’-GCGTAGTTGACTGGCGAAGTT-3’; _ probe, 
5'-AGGCCATCCGCACGGACAACCX-3’. Mouse Dmpk: forward, 5'-GACAT 
ATGCCAAGATTGTGCACTAC-3’; reverse: 5’-CACGAATGAGGTCCTGAG 
CTT-3’; probe, 5‘’-AACACTTGTCGCTGCCGCTGGCX-3’. Ap2M1, sequences 
previously reported’’. 18S rRNA, proprietary sequences (Applied Biosystems, 


catalogue number 4310893-E). Gtf2b, proprietary sequences 
Biosystems, catalogue number 4331182) 

Northern analysis. Total RNA (6 1g) was separated on agarose gels containing 
MOPS and formaldehyde, transferred to nylon membranes and hybridized 
with (CAG), or mouse actin **P-labelled oligonucleotide probes, as described 
previously’. 

Electromyography. EMG was carried out blind under anaesthesia, as described 
previously’®. Myotonic discharges were graded on a four-point scale: 0, no 
myotonia; 1, occasional myotonic discharge in less than 50% of needle insertions; 
2, myotonic discharge in greater than 50% of needle insertions; 3, myotonic 
discharge with nearly every insertion. 

RT-PCR analysis of alternative splicing. RT-PCR was carried out using the 
SuperScript III One-Step RT-PCR with Platinum Taq DNA Polymerase 
(Invitrogen) using gene-specific primers for CDNA synthesis and PCR amplifica- 
tion. The primers for Clen1, Sercal, Titin and Zasp were described previously’*”*. 
PCR products were separated on agarose gels, stained with SybrGreen I Nucleic 
Acid Gel Stain (Invitrogen) and imaged using a laser scanner (Fujifilm LAS-3000 
Intelligent Dark Box or GE Healthcare Typhoon 9400). Band intensities were 
quantified using ImageQuant software (GE Healthcare.) 

Transcriptome analysis by microarray. RNA was isolated from quadriceps 
muscle of wild-type mice or HSA'® transgenic mice treated with vehicle (saline), 
ASO 445236 or ASO 190401 (n = 4 per group, 25mgkg ' ASO twice weekly for 
4 weeks). RNA integrity was verified (RIN values >7.5 on Agilent Bioanalyzer). 
RNA was processed to cRNA and hybridized on microbeads using MouseRef-8 
v2.0 Expression BeadChip Kits (Illumina) according to the manufacturer’s 
recommendations. Image data were quantified using BeadStudio software 
(Illumina). Signal intensities were quantile normalized. We used row-specific off- 
sets to avoid any values of less than two, before the normalization. Data from all 
probe sets with six or more nucleotides of CUG, UGC or GCU repeats were 
suppressed to eliminate the possibility that expanded repeats in the hybridization 
mixture (CAG repeats in cRNA, originating from CUG®? RNA) could cross- 
hybridize with repeat sequences on probes. To eliminate genes whose expression 
was not readily quantified on the arrays, we suppressed probes that did not show a 
detection probability of P< 0.1 for all samples in the group that showed the higher 
mean expression level. Comparisons between groups were summarized and rank 
ordered by fold-changes of mean expression level and t-tests. The software 
package R (ref. 38) was used to perform principal components analysis 
(PCA)*° on wild-type, ASO-treated, and saline-treated microarray samples. 
The principal components allowed the capture of the majority of the expression 
variation in each sample within three dimensions. We plotted the first three 
principal components of each sample. Array data have been submitted to the 
Gene Expression Omnibus, accession number GSE38962 (http://www.ncbi.nlm. 
nih.gov/geo/query/acc.cgi?acc= GSE38962). 

Fluorescence in situ hybridization. Localization of CUG“? RNA by fluorescence 
in situ hybridization (FISH) was carried out using a CAG repeat oligoribonucleotide 
probe labelled with Texas Red at the 5’ end, on muscle cryosections from ASO- or 
saline-treated mice, as described previously’*. Images are maximum projections of 
deconvolved Z-plane stacks (9 images, 0.1- or 0.2-11M steps) captured under 
identical exposure and illumination conditions using a fluorescence microscope 
(Carl Zeiss Axioplan 2 or Nikon Eclipse E600), a charge-coupled device (CCD) 
digital camera (Hamamatsu ORCA R2 or Photometrics Cool Snap HQ) and 
Metamorph software (Molecular Devices). The Optigrid structured illumination 
imaging system (Qioptiq) was also used to capture images of DM328XL muscle. 
Maximum grey-level intensity was quantified using Metamorph. Objectives: 100 
Plan-APOCHROMAT 1.4 NA oil (Zeiss) or X60 Plan Apo 1.4 NA oil (Nikon). 
Muscle-fibre morphometry. To outline muscle fibres and label nuclei, 10-uM 
transverse cryosections of muscles from ASO- or saline-treated mice were fixed 
with 4% paraformaldehyde, pH 7.3, washed in PBS and incubated in 20 pg ml’ 
FITC-wheat germ agglutinin (WGA; Sigma) and 4,6 diamino-2 phenylindole 
dihydrochloride (DAPI; 1:20,000) in PBS for 1h at room temperature. Sections 
then were washed in PBS, mounted and sealed. Images were captured using an 
Axioplan 2 fluorescence microscope (Zeiss), an ORCA R2 CCD digital camera 
(Hamamatsu Photonics), Metamorph software and a X20 Plan-NEOFLUAR 0.5 
NA objective (Zeiss). Using the calipers application in Metamorph, the muscle- 
fibre diameter, defined as the minimum ‘Feret’s diameter’ (the mimimum distance 
of parallel tangents at opposing borders of the muscle fibre*'), was determined. 
Haematoxylin and eosin (H&E)-stained images were captured using an Infinity2-1 
1.4 megapixel colour CCD digital camera (Lumenera), Infinity Analyze 5.0 soft- 
ware (Lumenera) and a X10 Plan- NEOFLUAR 0.3 NA objective (Zeiss). 

5’ rapid amplification of cDNA ends analysis. 5’ rapid amplification of 
cDNA ends (RACE) was carried out using the FirstChoice RLM-RACE Kit 
(Invitrogen). In brief, 1 1g of total mRNA was ligated with 5’ RACE adaptor 
(5'-GCUGAUGGCGAUGAAUGAACACUGCGUUUGCUGGCUUUGAUGA 
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AA-3'), then reverse transcribed with a primer specific for the cleavage fragment 
(5'-TGAGAAGTCGCGTGCTGGAG-3’ for 190401, or 5'-TTTTTTTTACGCA 
GC-3' for 445236). The synthesized cDNA was treated with RNase H, then amp- 
lified with 5’ RACE Outer Primer and 5'-TTGCGGTGGACGATGGAAGG-3’ 
(for 190401 fragment), or 5’-TGTGTAAAACGACGGCCAGTACGCAGCTTA 
ACAGAATGAC-3’ (for 445236 fragment). The PCR products were analysed on 
agarose gels stained with SYBR Green I (Invitrogen) and scanned with a laser 
fluorimager (Typhoon, GE Healthcare). 

RNase H1 short interfering RNA experiments. MHT cells were cultured in 
DMEM supplemented with 10% fetal calf serum, streptomycin (0.1 mgml'), 
and penicillin (100 U ml“). Short interfering RNA (siRNA) treatments were 
carried out using Opti- MEM containing 5 mg ml Lipofectamine 2000, as previ- 
ously described*”. In brief, MHT cells were plated with 7,500 cells per well and were 
incubated for either 24 or 48h with 75nM of siRNA targeting RNaseH1 
(5'-GCTTGGTGAGACGTGCTTATT-3’ and 5’-TAAGCACGTCTCACCAA 
GCTG-3') or Ap2M1 (sequences reported previously”) in OPTI-MEM and 
Lipofectamine 2000. Twenty-four hours post transfection, cells were treated with 
increasing doses of the Malatl-targeting ASO 399479 in DMEM-10% FBS. 
Twenty-four hours after the addition of oligonucleotides, cells were lysed and 
RNA was isolated using RNAeasy and qRT-PCR was performed to determine 
the level of Malat! mRNA. 

Tissue drug-level determination. Approximately 30 to100 mg liver and muscle 
tissue were homogenized as described’’. Capillary gel electrophoresis (CGE) 
methods were used to measure unlabelled drug concentrations in mouse tissues. 
The methods for the hACTA1 ASOs were slight modifications of previously 
published methods***’, and consisted of a phenol-chloroform (liquid-liquid) 
extraction followed by a solid-phase extraction. An internal standard (ASO 
355868, a 27mer 2’-O-methoxyethyl-modified phosphorothioate oligonucleotide) 
was added before extraction. Tissue sample analyses were conducted using a 
Beckman MDQ capillary electrophoresis instrument (Beckman Coulter). 
Tissue-sample concentrations were calculated using calibration curves, with a 
lower limit of quantification (LLoQ) of approximately 1.14 1g ¢'. 

Biochemical analysis and serum chemistry. Serum separated in serum separator 
tubes (BD catalogue number 365956) was used to determine aspartate transaminase 
(AST), alanine transaminase (ALT), blood urea nitrogen (BUN) and creatinine 
values using Olympus reagents and an Olympus AU400e analyser (Melville). 
Evans blue dye uptake assay. Evans blue dye (EBD) was dissolved in PBS at a 
concentration of 10 mg ml ~ 1 and filter-sterilized. HSA, FVB/N, Mdx or C57BI/ 
10 mice were administered an intraperitoneal injection of 10 ul EBD solution 
per gram of bodyweight. After a period of 24h, muscle tissues were collected 
(quadriceps, gastrocnemius, tibialis anterior, diaphragm and heart). The mass of 
each muscle was recorded before lysing each sample individually in a microfuge 
tube containing N,N-dimethylformamide and a 5-mm steel bead, which was 
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subjected to 30 Hz shaking in a Qiagen TissueLyser II. Lysed muscle samples were 
heated at 55 °C and centrifuged, and the absorbance of the supernatant was deter- 
mined by spectrophotometric measurement at 636 nm. A standard curve of EBD 
in N,N-dimethylformamide enabled the EBD content in individual muscle 
samples to be determined. 

Statistical analysis. Group data are presented as mean + s.d., except where 
mean ~ s.e.m. is indicated. Between-group comparison was carried out using a 
two-tailed Student’s t-test or an analysis of variance (ANOVA), as indicated. 
A P value of <0.05 was considered significant. 
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Medulloblastomas are the most common malignant brain tumours in 
children’. Identifying and understanding the genetic events that drive 
these tumours is critical for the development of more effective 
diagnostic, prognostic and therapeutic strategies. Recently, our group 
and others described distinct molecular subtypes of medulloblastoma 
on the basis of transcriptional and copy number profiles”°. Here 
we use whole-exome hybrid capture and deep sequencing to 
identify somatic mutations across the coding regions of 92 primary 
medulloblastoma/normal pairs. Overall, medulloblastomas have 
low mutation rates consistent with other paediatric tumours, with 
a median of 0.35 non-silent mutations per megabase. We identified 
twelve genes mutated at statistically significant frequencies, includ- 
ing previously known mutated genes in medulloblastoma such as 
CTNNB1, PTCH1, MLL2, SMARCA4and TP353. Recurrent somatic 
mutations were newly identified in an RNA helicase gene, DDX3X, 
often concurrent with CTNNB1 mutations, and in the nuclear 
co-repressor (N-CoR) complex genes GPS2, BCOR and LDB1. We 
show that mutant DDX3X potentiates transactivation of a TCF 
promoter and enhances cell viability in combination with mutant, 
but not wild-type, B-catenin. Together, our study reveals the altera- 
tion of WNT, hedgehog, histone methyltransferase and now N-CoR 
pathways across medulloblastomas and within specific subtypes of 
this disease, and nominates the RNA helicase DDX3X as a compon- 
ent of pathogenic f-catenin signalling in medulloblastoma. 
Medulloblastomas are aggressive tumours of primitive neuroecto- 
dermal origin. More than one third of patients diagnosed with medul- 
loblastoma succumb to their disease within 5 years® and surviving 
patients often have significant long-term adverse effects from current 
therapies. Identifying the underlying genetic events responsible for 
medulloblastomas can help guide the development of more effective 
therapies and refine the selection of currently available chemotherapy 
and radiotherapy. Recent efforts profiling transcriptional and DNA 
copy number changes in medulloblastoma have provided insights into 
the biological processes involved in these tumours and have under- 
scored the molecular heterogeneity of this disease**. Based on these 
data, four broad subgroups have been established, known according toa 
consensus nomenclature as SHH, WNT, Group 3 and Group 4 (ref. 5). 
The first genome-scale sequencing of protein coding regions in 
medulloblastoma was reported recently’. Altered genes encoding for 
histone modification proteins were identified in 20% of cases, most 
notably MLL2 and MLL3 (ref. 7). This initial survey was limited by a 
small discovery sample size (22 patients), lack of subtype-specific ana- 
lysis, and use of Sanger sequencing technology insensitive to variants 


present at low allelic fraction. Here we survey coding somatic muta- 
tions at deeper coverage in a larger cohort of 92 medulloblastoma/ 
normal pairs and assess these mutations in the context of specific 
molecular subtypes (Supplementary Table 1). 

In total, 1,908 mutations were detected within 1,671 out of 18,863 genes 
sequenced to a median of 106 coverage (Supplementary Table 2). 
Confirmation of 20 candidate mutations in selected genes (CTNNB1, 
DDX3X, SMARCA4, TP53 and CTDNEP1) was performed by amplifica- 
tion of 48 exons using a microfluidic PCR device (Fluidigm) followed by 
single-molecule real-time sequencing (SMRT, Pacific Biosciences) 
(Supplementary Information). Sequence data was unavailable for one 
DDX3X mutation because of poor PCR amplification from the sample. 
All remaining 19 mutations were confirmed by this orthogonal method 
(median 73 redundant sub-reads, range 3-287, Supplementary Fig. 1). 

A median of 16 somatic mutations (12 non-silent, 4 silent) per 
tumour was identified, corresponding to a mutation rate of 0.35 non- 
silent mutations per megabase of callable sequence, less than most adult 
solid tumours and consistent with results from ref. 7. Six of the twelve 
most frequently mutated tumours were from the oldest patients 
(16-31 years at diagnosis), consistent with increased mutation fre- 
quency in adult versus childhood medulloblastomas (P = 7.7 X 10>, 
Wilcoxon rank-sum test, Supplementary Fig. 2). 

To identify genes mutated at statistically significant frequencies 
across our cohort, we used the MutSig algorithm*® which takes into 
account gene size, sample-specific mutation rate, non-silent to silent 
mutation ratios, clustering within genes, and base conservation across 
species. In our cohort of 92 samples, we identified 12 significantly 
mutated genes (q<0.1, Table 1 and Supplementary Table 3). 
Strikingly, these genes were not mutated in c5 (Group 3) and c4 
(Group 4) tumours with extensive somatic copy number alteration 
(Fig. 1), indicating that these subtypes are driven primarily by struc- 
tural variation, rather than base mutation. Not unexpectedly, CTNNB1 
(B-catenin) and PTCH1 were the two most significantly mutated genes 
(see Table 1 and Fig. 1). Point mutations of CTNNB1 in combination 
with loss of chromosome 6 were found in all WNT subgroup tumours 
and were concurrent with several other recurrently mutated genes, 
namely CSNK2B, DDX3X, TP53 and SMARCA4. Mutations involving 
PTCH1 occurred exclusively in SHH subgroup tumours and muta- 
tions of genes associated with the hedgehog pathway were also 
restricted to this subgroup (P < 0.0001, Fisher’s exact test). All but 
one of the tumours with PTCH1 mutations had somatic loss of 9q, 
resulting in hemizygosity for the mutant allele. The remaining tumour 
had apparent copy neutral loss-of-heterozygosity of 9q22. Other 
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Table 1 | Genes mutated at a statistically significant frequency in 92 medulloblastomas. 


Gene Description Mutations Patients Unique sites Silent Missense Indel or null Double null q 
CTNNBI1 _ B-catenin 6 6 4 ) 6 0 0 <18x10711 
PTCH1 Patched homologue 1 (Drosophila) 7 ri 7 @ 0 7 0 4.0 x 10° 
MLL2 Myeloid/lymphoid or mixed-lineage leukaemia 2 10 8 10 ) 2 4 4 4.0 x 10° 
DDX3X_ DEAD box polypeptide 3, X-linked vi 7 7 0 7 ) 0 2.3 x 108 
GPS2 G protein pathway suppressor 2 3 3 3 ) 1 2 0 1.2x10% 
TP53 Tumour protein p53 3 3 3 0) 3 ) 0 0.039 
KDM6A _ UTX, lysine (K)-specific demethylase 6A 3 3 3 6) 2 1 0 0.042 
BCOR BCL6 co-repressor 3 3 3 0 0 3 0 0.046 
SMARCA4_ ATP-dependent helicase 4 4 3 ) 4 ) 0 0.046 
LDB1 LIM domain binding 1 2 2 2 ) 1 1 0) 0.047 
CTDNEP1 CTD nuclear envelope phosphatase 1 2 2 2 @) 0 2 0 0.047 
CSNK2B Casein kinase 2, B polypeptide 2 2 2 ) 2 ) 0 0.071 


Null, nonsense, frameshift or splice-site mutations; double null, null mutations co-occurring in a single tumour; q, q-value, false discovery rate (Benjamini-Hochberg procedure). See Supplementary Table 3 for 


further statistics and subtype analysis. 


somatic mutations of hedgehog pathway members include a splice site 
mutation in SUFU, an in-frame deletion in WNT6, and missense 
mutations in GLI2, SMO, PRKACA, WNT2 and WNT2B. 

Two patients with SHH subgroup tumours had germline variants in 
PTCHI, one with somatic loss of 9q resulting in hemizygosity for a loss- 
of-function germline allele (MD-085, c.3030delC, p.Asn1011Thrfs*38), 
and the other with a substitution previously reported in patients with 
holoprosencephaly (MD-286, p.T1052M, ref. 9). Two additional cases 


Identifier (MD-) 


(MD-097 and MD-335) had loss-of-function variants in SUFU (1 fra- 
meshift deletion and 1 nonsense) that began as heterozygotes in the 
germline and became hemizygous in the tumour, due to somatic loss of 
chromosome 10 in one case and copy neutral loss-of-heterozygosity in 
the other. 

MLL2 was also subject to recurrent inactivating mutations, consist- 
ent with findings from ref. 7 and providing further evidence for 
dysregulated histone modification in medulloblastoma. Indeed, six 
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Figure 1 | Demographic characteristics, molecular subtypes and selected 
copy number alterations and somatic mutations across 92 medulloblastoma 
cases. Data tracks describing 92 medulloblastoma cases. Identifier, unique 
name used to denote each case. Identifiers also link samples to those analysed in 
ref. 2. Sex, males in blue, females in pink. Age, years of age at diagnosis binned as 
infants, children or adults. Histology, pathology review of primary tissue 
specimen. Subtypes, based on copy number profiles derived from sequence or 
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microarray data. Consensus subtypes from refs 2 and 5 as published. Copy 
number alterations, selected copy number alterations used to assign tumours to 
subtypes. Blue, losses; red, gains. Somatic mutations, gene names (HUGO 
symbols) grouped by functional category. MutSig gene names are in bold. 
Black, missense mutations; orange, nonsense/splice site/indel mutations; 
purple, silent mutations; green, germline variants. 
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of the twelve most significantly mutated genes are involved in histone 
modification and/or related chromatin remodelling complexes 
(MLL2, GPS2, KDM6A, BCOR, SMARCA4 and LDB1; see Table 1). 
As a gene set, histone methyltransferases (HMTs) were enriched for 
somatic mutation with 21 tumours having apparent, predominantly 
loss-of-function, HMT mutations (q=5.8 x 10°; Fig. 2 and 
Supplementary Table 4). 

Subtype-specific MutSig analysis identified additional significant 
mutations of histone-modifying genes, MLL3 and HDAC2, in Group 
4 tumours along with KDM6A mutations (q = 0.039 and 0.066, see 
Supplementary Table 3). Mutations in KDM6A, interestingly, 
occurred exclusively in tumours with an il7q as the sole autosomal 
alteration (P = 0.0023, Fisher’s exact test) with the one female case 
with KDM6A mutation also having loss of a chromosome X. 
Notably, the two ‘i17q only’ tumours without KDM6A mutations 
had other histone-modifying enzymes mutated, namely THUMPD3, 
ZMYM3 and MLL3, perhaps suggesting a distinct biology for tumours 
with this karyotype. 

Mutations in several genes encoding components of the nuclear 
co-repressor (N-CoR) complex were observed at a statistically signifi- 
cant frequency: BCOR in 3 tumours, GPS2 in 3 tumours, and LDB1 in 2 
tumours. BCOR mutations have recently been reported at high fre- 
quency in retinoblastoma” and in ‘copy-neutral’ acute myelogenous 
leukaemia’. BCOR is located on the X-chromosome and two 
hemizygous frameshift mutants were found in tumours from males 
(allele fractions 0.90 and 0.92). A third nonsense mutation was also 
found in a male but at low allelic fraction (0.12), indicating a subclonal 
event. Two out of three BCOR mutations occurred in SHH subgroup 
tumours. LDB1 missense and nonsense mutations were found in two 
additional SHH tumours, both appearing hemizygous due to loss of 
10q and complete chromosome 10 loss, respectively (allele fractions 
0.81 and 0.78). Both BCOR and LDB1 promote assembly of the 
repressive N-CoR complex” and harbour apparent loss of function 
mutations. GPS2, which encodes a critical subunit of the N-CoR com- 
plex, a repressor of JNK/MAPK signalling through partnership with 
histone deacetylases’*, was mutated in two Group 3 tumours. The 
GPS2 mutations cluster within amino acids 53-90, the domain critical 
for heterodimerization with NCOR2 (also known as SMRT) and 
interacting with a TBL1 amino-terminal domain tetramer to assemble 
the N-CoR repression complex”. Finally, an additional nonsense muta- 
tion in NCOR2 was identified in a single SHH subgroup tumour, under- 
scoring the central role of N-CoR dysregulation in medulloblastoma 
development and particularly within the SHH subgroup. 

Several genes encoding subunits of the SWI/SNF-like chromatin- 
remodelling complex were also mutated in our cohort, including sig- 
nificant recurrent mutations of SMARCA4(Brg/BAF190), which 
encodes a DNA helicase with ATPase activity’’ and has been reported 
to be mutated in lung, ovarian, and pancreatic cancers! as well as 
medulloblastoma’”*. In our cohort, SMARCA4(Brg/BAF190) mutations 
clustered in helicase domains and occurred in three Group 3 tumours 
(significant within the cl subtype, q = 0.019), and one WNT tumour. 
In addition, mutations were found in the alternative ATPase subunit 
SMARCD2(Brm) (missense at a highly conserved residue) and two 
other members of the SWI/SNF complex, ARID1B(BAF250b) (2 base 
pairs (bp) frameshift deletion) and SMARCC2(BAF170) (splice site). 
These were all apparent loss-of-function mutations and occurred in 
SHH tumours. Thus, it seems that disruption of this complex is fre- 
quent across medulloblastomas. 

New and hemizygous mutations were found in CTDNEP1 (previ- 
ously known as DULLARD), a phosphatase with roles in Xenopus 
neural development through regulation of BMP receptors’®, and as a 
direct regulator of LIPIN, an integral component of the mTOR com- 
plex’’. CTDNEP1 mutations were found in two Group 3 tumours 
(significant within the subtype, q = 0.0087), a 2-bp frameshift deletion 
and a substitution disruptive of a splice site. Both tumours have i17q 
chromosomes, resulting in loss of the wild-type allele at 17p13. 
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Figure 2 | Location of mutations in histone methyltransferases, RNA 
helicases and N-CoR complex-associated genes. Location of somatic 
mutations on linear protein domain models of genes from sets frequently 
mutated in medulloblastoma. All domain annotations are from UniProt and 
InterPro annotations. Diagrams were constructed using Domain Graph 
(DOG)”**, version 2.0. a, Histone methyltransferase domains: red, SET; green, 
coiled-coil; blue, zinc-finger; cyan, other. b, N-CoR complex-associated domains: 
purple, anti-parallel coiled-coil domains required for GPS2-NCOR2 (SMRT) 
interaction”; yellow, other interaction domains as labelled (SANT domains binds 
DNA, CoRNR domains binds nuclear receptors, ANK repeats mediate a diversity 
of protein-protein interactions, and LIM-binding domains bind a common 
protein structural motif). c, RNA helicase domains: cyan, helicase and helicase- 
associated (InterPro); red, RNA-binding and RNA polymerase sigma factor 
(InterPro); blue, ATP binding site; green, DEAD or DExH box motif. See 
Supplementary Table 1 for UniProt protein model identifiers. 


Mutations in DDX3X, an ATP-dependent RNA helicase with func- 
tions in transcription, splicing, RNA transport and translation’’, were 
found in seven tumours, including half of the WNT pathway tumours 
(P = 0.005, Fisher’s exact test) and several SHH subgroup tumours. 
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DDX3X mutations have recently been reported at low frequency in five 
other tumour types (Catalogue of Somatic Mutations in Cancer, 
COSMIC") but the significance of these mutations for DDX3X func- 
tion remains unclear. To understand the consequence of observed 
point mutations on the physical structure of DDX3X, we mapped 
the mutations onto the previously reported crystal structure of 
DDX3X” and its orthologue DDX4 (also known as VASA, ref. 20) 
(Fig. 3a; Supplementary Fig. 3 and Supplementary Table 5). The 
mutations seem to cluster in two structural domains, a helicase 
ATP-binding domain (residues 211-403) and a helicase carboxy- 
terminal domain (residues 414-575). The location of these mutations 
indicates that they may alter DDX3X-RNA interaction (Fig. 3a and 
Supplementary Table 5). 

As half of the B-catenin mutated tumours contained concurrent 
DDX3X mutations, we investigated whether DDX3X could enhance 
the of ability B-catenin to transactivate a TCF4-luciferase reporter 
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Figure 3 | Functional consequence of DDX3X point mutations. a, Three- 
dimensional model of the two recA-like domains of human DDX3X in complex 
with single-stranded RNA and a Mg-ATP analogue. Displayed are the residues 
mutated in the amino-terminal recA-like domain (R276K, D354H, R376C) and 
C-terminal recA-like domain (D506Y, R528H, R534H, P568L). Colouring: 
light blue, DDX3X residues 166-405; dark blue, DDX3X residues 406-582; 
cyan, single-stranded RNA; magenta and green, Mg-ATP analogue. Molecular 
graphics images were produced using the University of San Francisco Chimera 
package” (http://www.cgl.ucsf.edu/chimera). b, Mutant DDX3X potentiates 
mutant B-catenin transactivation of TOPflash promoter. Represented is relative 
luciferase activity in 293T cells co-transfected with TOPflash reporter, FOPflash 
control, and either wild-type or mutant DDX3X in combination with wild-type 
or mutant -catenin. One-dimensional model of DDX3X displayed above bar 
graphs to illustrates the position of the mutations. WT, wild type. c, Cell viability 
assays of medulloblastoma D425 cells stably transduced with either wild-type or 
mutant DDX3xX lentivirus in combination with either wild-type or mutant 
-catenin lentivirus. For b and ¢, error bars depict the standard deviation of the 
mean from five replicate experiments performed for each condition. Student’s 
t-tests were performed to evaluate significance of differences in TOPflash 
intensity or cell proliferation value distributions as follows: increases with 
DDX3X alone versus empty vector, increases with wild-type B-catenin versus 
DDX3xX alone, increases with mutant f-catenin versus DDX3X alone, and 
increases with mutant B-catenin versus wild-type B-catenin. 
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(TOPflash) and if DDX3X/B-catenin co-expression had a measureable 
effect on cell viability/proliferation. In combination with wild-type 
B-catenin, neither wild-type nor mutant DDX3X alone significantly 
transactivated the TOPflash reporter. However, in combination with 
mutant B-catenin (S33Y substitution), the majority of DDX3X point 
mutants in our cohort potentiated reporter activity (P < 0.05, Fig. 3b). 
This potentiation was also apparent in cell viability assays in both HeLa 
(data not shown) and D425 medulloblastoma cell lines (P< 0.05, 
Fig. 3c). 

Given the apparent importance of DDX3X mutations in medullo- 
blastoma, we searched the genes listed in the RNA Helicase Database 
(http://www.rnahelicase.org/) for low frequency mutations in medul- 
loblastoma. We found five tumours with mutations within RNA heli- 
case or RNA binding domains of DHX9, DHX32, DHX57, FANCM 
and SKIV2L (Fig. 2 and Supplementary Table 6). The missense muta- 
tions were at conserved residues and predicted to be deleterious by the 
software packages SIFT, AlignGVGD and PolyPhen2. In addition, a 
frameshift insertion in SETX occurs upstream of, and probably dis- 
rupts, its RNA helicase domain. Overall, 15% of medulloblastomas 
seem to have some disruption of RNA helicase activity. 

In summary, we report a next-generation sequencing analysis of 
medulloblastoma, the most common malignant brain tumour in 
children. Our results reveal mutations in several known pathways such 
as histone methylation (MLL2 and others), sonic hedgehog (PTCH1, 
SUFU and others) and Wnt (CTNNB1 and others), and also previously 
unrecorded mutations in genes including DDX3X, BCOR, LDB1 and 
GPS2. Our preliminary functional studies implicate DDX3X as a 
candidate component of pathogenic WNT/B-catenin signalling. In a 
broader sense, DDX3X mutations have recently been reported in 
chronic lymphocytic leukaemia” and head and neck cancers”, both 
of which have subsets of tumours with dysregulated WNT signalling. 
Studies investigating whether mutant DDX3X functions together with 
B-catenin in these contexts should provide additional insights into this 
multifaceted molecule and open potential avenues for novel therapies. 
Finally, the delineation of nuclear receptor co-repressor complex 
molecules as altered in medulloblastoma provides new insight into 
the pathogenesis of this deadly childhood disease. 


METHODS SUMMARY 


Informed consent was provided by families of medulloblastoma patients treated at 
Children’s Hospital Boston, The Hospital for Sick Children, Toronto, Canada, and 
institutions contributing to the Children’s Oncology Group/Cooperative Human 
Tissue Network, under approval and oversight by their respective Internal Review 
Boards. All tumours were obtained at the initial surgical resection and recurrent 
tumours were excluded from our analysis. Haematoxylin- and eosin-stained slides 
of tumour samples were reviewed by a pathologist to confirm the diagnosis of 
medulloblastoma, determine histological subtype when possible, and assess 
tumour purity. DNA was isolated from tumour specimens and matched 
peripheral blood as previously described’. Exome sequencing of DNA from 92 
tumour/normal pairs was performed using in-solution hybrid-capture of 193,094 
exons from 18,863 micro RNA (miRNA)- and protein-coding genes, followed by 
sequencing of 76bp paired-end reads using Illumina sequencing-by-synthesis 
technology**. Reads were aligned to human genome build GRCh37™ using a 
Burrows-Wheeler aligner (BWA). The ~33-megabase target region was 
sequenced to 106% mean coverage in each sample (range 73-234). Gene 
expression data and copy number profiles (derived from SNP microarrays or 
sequence data) were used to assign each tumour to a subgroup using published 
criteria’, Our cohort consisted of 6 WNT (c6), 23 SHH (c3), 33 Group 3 (12 cl, 21 
c5), and 30 Group 4 (12 c2, 18 c4) tumours (see Supplementary Table 1 for case 
annotations). Mutations were detected using muTect, annotated using 
Oncotator”’, and manually reviewed using the Integrated Genomics Viewer 
(IGV)’’. For validation, PCR on Access Array microfluidic chips (Fluidigm) was 
followed by single-molecule real-time sequencing (Pacific Biosciences) as per 
manufacturer’s instructions. Sub-reads were extracted and assigned to samples 
using manufacturer’s and custom software, and aligned to the hg19 (GRCh37) 
build of the human reference genome sequence using BWA-SW”. Candidate 
mutations were confirmed by manual review using IGV”’ (Supplementary Fig. 1). 
See Supplementary Information and http://www.broadinstitute.org/cancer/cga/ for 
complete descriptions of materials and methods. 
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A map of the cis-regulatory sequences in the mouse 


genome 


Yin Shen'*, Feng Yuel*, David F. McCleary’, Zhen Yel, Lee Edsall', Samantha Kuan’, Ulrich Wagner, Jesse Dixon”, Leonard Lee’, 


Victor V. Lobanenkov* & Bing Ren!” 


The laboratory mouse is the most widely used mammalian model 
organism in biomedical research. The 2.6 X 10° bases of the mouse 
genome possess a high degree of conservation with the human 
genome’, so a thorough annotation of the mouse genome will be 
of significant value to understanding the function of the human 
genome. So far, most of the functional sequences in the mouse 
genome have yet to be found, and the cis-regulatory sequences in 
particular are still poorly annotated. Comparative genomics has 
been a powerful tool for the discovery of these sequences’, but on its 
own it cannot resolve their temporal and spatial functions. 
Recently, ChIP-Seq has been developed to identify cis-regulatory 
elements in the genomes of several organisms including humans, 
Drosophila melanogaster and Caenorhabditis elegans**. Here we 
apply the same experimental approach to a diverse set of 19 tissues 
and cell types in the mouse to produce a map of nearly 300,000 
murine cis-regulatory sequences. The annotated sequences add up 
to 11% of the mouse genome, and include more than 70% of con- 
served non-coding sequences. We define tissue-specific enhancers and 
identify potential transcription factors regulating gene expression in 
each tissue or cell type. Finally, we show that much of the mouse 


genome is organized into domains of coordinately regulated enhancers 
and promoters. Our results provide a resource for the annotation of 
functional elements in the mammalian genome and for the study of 
mechanisms regulating tissue-specific gene expression. 

We identified the genomic localizations of RNA polymerase II 
(poll), the insulator-binding protein CCCTC-binding factor (CTCF) 
and three chromatin modification marks, histone H3 lysine4 
trimethylation (H3K4me3), histone H3 lysine4 monomethylation 
(H3K4mel1) and H3 lysine 27 acetylation (H3K27ac), in 13 adult 
tissues, four embryonic tissues and two primary cell lines (Fig. 1a, b) 
by performing chromatin immunoprecipitation followed by high- 
throughput sequencing (ChIP-Seq)° (Supplementary Tables 1 and 2). 
Enrichment of H3K4me3 or pollI binding signals is indicative of 
an active promoter, whereas the presence of H3K4mel or H3K27ac 
outside promoter regions can be used as marks for enhancers”. 
CTCF binding is considered a mark for potential insulator elements’”. 
Ina subset of tissue and cell types, we also performed ChIP-Seq on the 
co-activator protein p300 and used its promoter-distal binding sites to 
train an enhancer prediction tool on the basis of chromatin signa- 
tures’*. We determined the transcriptome in each tissue and cell 
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Figure 1 | Identification of cis-regulatory elements in the mouse genome. 
a, UCSC genome browser views of ChIP-Seq and RNA-Seq data for mESC, 
heart and liver (chromosome 4). The values on the y axis for ChIP-Seq data are 


regulatory elements in the 19 tissue and cell types. E14.5, embryonic day 14.5; 
MEF, murine embryonic fibroblast. c, Percentages of known cis-regulatory 
elements recovered in this study. 


input normalized intensities. kb, kilobases. b, An overview of the predicted 
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type through RNA-Seq experiments, using a protocol that can detect 
both the abundance and strand of origin of RNA transcripts'* 
(Supplementary Fig. 1). By analysing the genomic occupancy of the 
above chromatin marks and transcription factors (Supplementary 
Methods), we identified 295,676 non-redundant cis-regulatory 
sequences, including 53,834 putative promoters, 234,764 potential 
enhancers and 111,062 CTCF-binding sites (Fig. 1b). With an estimated 
span of 1,000 base pairs for each element, the combined length of these 
putative cis regulatory sequences is 295.6 million base pairs, or 11% of 
the mouse genome. 

To determine the accuracy and completeness of our cis-regulatory 
sequence mapping, we first compared the identified promoters with 
known promoters. We recovered 79% of RefSeq-annotated promoters’* 
(Fig. 1c and Supplementary Fig. 2a) and confirmed an additional 62% 
of University of California, Santa Cruz (UCSC)-annotated promoters 
(13,205 out of 21,433) that are not annotated in RefSeq. As expected, 
annotated promoters not recovered by our study are generally 
expressed in tissues that were not investigated in this work (Sup- 
plementary Table 3). In addition to the annotated promoters, we also 
identified 13,438 novel promoters. When tested with a luciferase 
reporter, 85% of 65 randomly selected novel promoters showed sig- 
nificant promoter activity in at least one orientation (P< 0.01, 
Student’s t-test) (Supplementary Fig. 3a, b), supporting their function 
as promoters. Next we compared the predicted enhancers with a list 
of 726 experimentally validated enhancers'® and found that 82% of 
them were correctly identified in this study (Fig. 1c and Supplemen- 
tary Fig. 2b). We also randomly selected eight predicted murine 
embryonic fibroblast (MEF) enhancers for validation and found that 
six of them (75%) gave positive results (Supplementary Fig. 4) 
(P< 0.01, Student’s t-test), supporting the reliability of our enhancer 
identification method. In addition, we recovered 94.5% of previously 
reported CTCF-binding sites in mouse embryonic stem cells 
(mESCs)’’ (Fig. 1c), demonstrating the high sensitivity of our detection 
method for CTCF binding. Further, we detected 77,236 novel CTCF- 
binding sites, 87.5% of which contained the canonical CTCF motifs 
(P<2.2 X10 '°, binomial distribution). The novel CTCF-binding 
sites tend to be more tissue-specific than the sites identified previously 
(Supplementary Fig. 5). The above evidence indicates that we have 
correctly identified most known cis-regulatory sequences and have 
uncovered many novel ones. 

Functional elements are often under negative selection during 
evolution, so a high level of sequence conservation is frequently used 
as evidence of function. However, there are also reports showing that 
transcription factor binding may be rapidly lost or gained during 
evolution’*”’, arguing that the usage of cis-elements may evolve more 
quickly. We examined the sequence conservation of different classes of 
the cis-regulatory sequences identified in this study, and found that 
promoters are characterized by the highest degree of sequence conser- 
vation (Fig. 2a). In contrast, CTCF-binding sites and enhancers have a 
much lower but still significant level of sequence conservation. We 
next assessed the level of conservation of cis-regulatory element usage 
between the mouse and human genomes in embryonic stem cells 
(ESCs)’° (Fig. 2b). More than 70% of homologous promoters are 
associated with H3K4me3 in both species, confirming a high degree 
of conservation in promoter usage (Fig. 2c, d). However, only 25.7% 
and 24.8% of enhancers and CTCF-binding sites, respectively, found 
in human ESCs are still associated with H3K4mel1 or CTCF binding in 
mESCs, despite a high degree of sequence conservation (Fig. 2c). These 
results suggest that the cis-regulatory elements identified in the mouse 
genome are under different selective pressure during evolution, with 
promoters being most conserved in both sequence and usage, whereas 
enhancers and CTCF-binding sites are undergoing a considerable 
degree of evolution. This result agrees well with the recent findings 
of large interspecies differences and divergence of transcriptional 
regulation’. 
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Figure 2 | Evolutionary conservation of the identified cis-regulatory 
elements. a, Evolutionary conservation of cis-regulatory elements, in 
comparison with exons and random genomic sequences. Asterisk, P< 0.001, 
Fisher’s exact test. b, UCSC genome browser views of chromatin state and 
CTCF-binding sites at Sox2 loci for mESCs and human ESCs (hESCs) on 
chromosome 3. DNA sequences, chromatin states and CTCF binding are all 
conserved in this region. c, Number of hESC regulatory elements that are 
conserved and predicted as regulatory elements in mESCs. d, Number of mESC 
regulatory elements that are conserved and predicted as regulatory elements in 
hESCs. e, Functional annotation of the conserved non-coding sequences based 
on the cis-regulatory elements identified in this study. The asterisk in c, d and 
e indicates CTCF-binding sites that do not overlap with either promoters or 
enhancers. 


Comparative genomic methods have identified a significant number 
of mammalian sequences as non-protein coding but undergoing nega- 
tive selection during evolution, commonly referred to as conserved 
non-protein-coding sequences (CNSs). These sequences are suspected 
to have important biological roles, yet their precise function remains to 
be defined. We compared our map of cis-regulatory elements with a list 
of CNSs” and found that 70% of them fall into one of the three classes 
of predicted cis-elements: 15% as promoters, 53% as enhancers and 2% 
as CTCF-binding sequences. Additionally, 1% of the CNSs seem to be 
non-coding RNA sequences as supported by the RNA-Seq data (Fig. 2e 
and Supplementary Fig. 2c). Most CNSs therefore seem to function in 
regulating transcription. 

We previously showed that enhancers in the human genome are 
associated with active chromatin marks in a cell-type-specific manner, 
whereas promoter and insulator elements tend to be ubiquitously 
occupied in multiple cell lines'®. Here we found that the occupancy 
of enhancers by H3K4mel in the mouse genome is still the most tissue- 
specific (Fig. 3a). In contrast, we observed that whereas H3K4me3 
occupies most RefSeq promoters in multiple tissues, a significant num- 
ber of promoters, especially the novel promoters discovered in this 
study, show tissue-specific occupancies by H3Kme3 or pollI (Fig. 3a) 
(Supplementary Fig. 3d), with many of them corresponding to alter- 
natively used promoters (Supplementary Table 4 and Supplementary 
Fig. 6). We also found that most CTCF-binding sites are occupied in 
multiple tissues (Fig. 3a). The tissue-specific CTCF-binding sites 
showed significant overlap with enhancers (P< 1.8 X 110-< 72, 
binomial distribution), whereas the ubiquitous CTCF-binding sites 
overlapped significantly with promoters (P< 9.0 X 10° *’, binomial 
distribution) (Supplementary Fig. 5b, c), suggesting that a fraction of 
the CTCF-binding sites may function through promoters and 
enhancers, although the exact role of CTCF at these regions remains 
unclear. These results indicate that a large fraction of cis-regulatory 
elements are active in a tissue-specific manner and are most probably 
involved in regulating tissue-specific gene expression. 
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Figure 3 | Genomic organization of co-regulated promoters and enhancers. 
a, Tissue specificity of the usages of promoters (H3K4me3 and poll], 
enhancers (H3K4mel1) and CTCF-binding sites. b, Distribution of the 
Spearman correlation coefficient of H3K4mel at enhancers and pollI at 
promoters of random permutation, the nearest TSS model, and the CTCF block 
model. c, Enhancers and promoters form co-regulated clusters of different 
sizes, as shown by the Spearman correlation coefficient of H3K4mel at 
enhancers and pollI at promoters on chromosome 19. d, Hi-C interaction 
heatmap showing that the physical partitioning of the genome is highly 
correlated with the EPUs that encompass Pedha, Pcdhf} and Pcdhy gene clusters 
on chromosome 18. Top: normalized Hi-C interaction frequencies in mouse 
cortex as a two-dimensional heatmap. Bottom: UCSC genome browser views of 
the same regions, including the identified EPUs and the ChIP-Seq data 
(H3K27ac, H3K4mel, H3K4me3, pollI and CTCF) in cortex. e, The average 
normalized Hi-C interaction frequencies for enhancer-promoter pairs within 
EPUs, between EPUs, and expected by random chance. 


Enhancers are important in regulating tissue-specific expression 
patterns during mammalian development. However, finding target 
genes for enhancers is not straightforward because they are frequently 
distal from the genes they control. Assigning enhancers to the nearest 
transcription start sites is the most widely used method. A recently 
published strategy associates enhancers and promoters located within 
the same domain defined by the CTCF-binding sites, assuming that 
insulators can block promoter-enhancer interactions’®. We evaluated 
these two methods by assessing the Spearman correlation coefficients 
(SCCs) between H3K4mel signals at enhancers and the polll 
intensities at target promoters (Supplementary Methods). As a control, 
we observed that the SCCs from the randomly paired enhancers and 
promoters have a bell-shaped distribution with a median of 0 (Fig. 3b). 
The distribution of the SCCs from enhancer—promoter pairs identified 
by the nearest transcription start site (TSS) model and CTCF block 
model are only slightly better than the random control, with medians 
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at 0.11 and 0.08, respectively (Fig. 3b). In addition, 34% and 38% of the 
enhancer/promoter pairs in the nearest TSS model and the CTCF 
block model, respectively, are negatively correlated, indicating poten- 
tially incorrect promoter assignment. To improve the linking of 
enhancers to their targets, a logistic regression classifier was recently 
introduced and shown to perform better than the nearest TSS model”’. 
However, this model is still based on the one-to-one relationship 
between an enhancer and a gene, with a bias towards the nearby genes. 
It has been reported that a significant fraction of enhancers may not 
target the nearest promoters”. Therefore, to gain a better understand- 
ing of enhancer/promoter organization we assessed the correlation of 
the chromatin state at enhancers and pollI occupancy at promoters for 
each possible pair of elements along a chromosome. We observed 
that co-regulated promoters and enhancers tend to form clusters with 
variable sizes (Fig. 3c). We developed an algorithm to detect these local 
clusters, defined as enhancer-promoter units (EPUs) (Supplementary 
Methods). Performing this analysis genome-wide, we defined 8,792 
EPUs that contained at least one promoter and one enhancer 
(Supplementary Table 5), encompassing 1,258 million base pairs, or 
nearly half of the mouse genome. The median enhancer-to-promoter 
ratio per EPU was 5.67 (Supplementary Table 6), which is consistent 
with the idea that multiple enhancers may be used to regulate a gene”’. 
We confirmed that previously defined enhancer-promoter pairs are 
frequently located within the same EPU. For example, out of the 2,605 
putative enhancer-promoter pairs recently defined in the human 
genome”, most of their mouse homologues are found within the same 
EPU (83.8% observed versus 43% expected; P< 2.2 X 10 1°, Fisher’s 
exact test). In addition, each of the four linked enhancer-promoter 
pairs reported by a recent study** was found within the same EPU. 
Finally, seven locus-control regions for Hbb genes were all identified 
within the same EPU”. 

The discovery of EPUs provides strong evidence that the genome is 
partitioned into functional domains in which cis-regulatory elements 
are coordinately regulated, whereas elements located in different 
domains are relatively insulated from each other. This organization 
is reminiscent of recently identified topological domains, defined by 
chromatin interactions, in the mammalian genome’*’’. Indeed, 
comparison of the EPUs with the higher order chromatin organization 
shows that physical partitioning of the genome is highly correlated 
with functional partitioning on the basis of the coordinated activities of 
cis-regulatory sequences (Fig. 3d and Supplementary Fig. 7). 

EPUs provide a new approach for associating enhancers with their 
target genes. Instead of being linked to the nearest genes, an enhancer 
could be assigned to one or more promoters within an EPU that show 
significant correlation. To validate the enhancer-promoter relation- 
ship predicted by this approach (Supplementary Table 7), we examined 
long-range looping interactions between the enhancers and promoters, 
reasoning that true enhancer-promoter target pairs should have higher 
interaction frequencies than neighbouring non-target sites. We per- 
formed chromosome confirmation capture (3C) experiments for five 
enhancer—promoter pairs predicted to be linked in the cortex but not in 
mouse ES cells, and two enhancer-promoter pairs predicted not to be 
linked in either tissue or cell type. The five linked pairs showed enrich- 
ment of 3C signals, whereas the two non-linked pairs did not, 
indicating that the EPU analysis can accurately reveal a enhancer- 
promoter targeting relationship (Supplementary Fig. 8 and Sup- 
plementary Table 8). For a systematic evaluation of the enhancer- 
promoter pairing relationships as defined by this approach, we 
examined long-range looping interactions in adult mouse cortex 
genome-wide by using the Hi-C method’. We observed that inter- 
actions between predicted enhancer-promoter pairs within the same 
EPUs occured significantly more frequently than interactions between 
enhancer-promoter pairs of the same genomic distance but across 
different EPUs or by random chance (Fig. 3e; P<2.2x 10 "°, 
Wilcoxon test). These results suggest that EPUs may help in assigning 
enhancers to their target promoters. 
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Mammalian development requires a precise temporal gene expres- 
sion program that is tightly controlled by transcription factors and 
cis-regulatory elements. The map of cis-regulatory sequences now 
provides a chance for us to analyse the potential mechanisms 
involved in temporal regulation of gene expression. First, we identified 
enhancers specific to embryonic and adult brain on the basis of 
H3K4mel intensities (Fig. 4a). We observed that the former class 
was associated with genes expressed in neuron differentiation and 
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neuron development, whereas the latter was associated with genes 
important for adult brain functions, for example the transmission of 
nerve impulses (Fig. 4b, c and Supplementary Fig. 9). We made 
similar observations for stage-specific enhancers in liver and heart 
(Supplementary Figs 9 and 10). 

Wealso systematically identified potential transcriptional regulators 
acting on tissue-specific gene expression programs. We first defined 19 
groups of tissue-specific enhancers on the basis of H3K4me1 occupancy 
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Figure 4 | Motif analysis of tissue-specific enhancers. a, Classification of 
development stage-specific enhancers based on their chromatin state 
(H3K4mel1) between embryonic (embryonic day 14.5; E14.5) and adult brain. 
band c, Gene Ontology analysis for the genes associated with embryonic brain- 
specific enhancers and adult cortex-specific enhancers. d, Classification of 
tissue-specific enhancers on the basis of their chromatin state (H3K4mel1) 
among different tissue and cell types. The first 19 tissue-specific clusters were 
used for further motif analysis. The last cluster contains enhancers enriched in 
multiple tissues with no clear patterns. e, Enrichment of three transcription 


factor recognition motifs in the predicted enhancers. REST, RE1-silencing 
transcription factor. f, Heatmap showing the clustering of 270 transcription 
factor motifs on the basis of their enrichment in the various groups of 
enhancers as identified in e. g, Boxplot showing that the de novo motifs found in 
tissue-specific enhancers are evolutionarily conserved. h-k, Examples of motifs 
that show high sequence conservation: h, REST motif in cortex-specific 
enhancers; i, Hnfl motif in kidney-specific enhancers; j, Oct4 motif in mESC- 
specific enhancers; k, Atoh1 motif in cerebellum-specific enhancers. 
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(Fig. 4d). Gene Ontology term analysis confirmed that the enhancers in 
each group are linked to genes specifically expressed in the correspond- 
ing tissue or developmental stage (Supplementary Fig. 11). We also 
observed that the known motifs of transcription factors that have been 
reported to function in certain tissues are enriched in the tissue-specific 
enhancers from the same tissue (Fig. 4e). To identify new transcription 
factors involved in each group of tissue-specific enhancers, we per- 
formed de novo motif analysis and identified 206 motifs with a very 
stringent cutoff (P< 10°; Supplementary Tables 9 and 10). We 
found that 91% of them (188 out of 206) showed significant levels of 
evolutionary conservation among the vertebrate species (Fig. 4g, h-k). 
We annotated the most likely transcription factor for each motif by 
comparing it with public transcription factor databases and verified 
that the matching transcription factor was expressed in the corres- 
ponding tissue. A total of 62% of the conserved de novo motifs (117 
out of 188) were associated with a known transcription factor, and 75% 
of them (88 out of 117) have previously been implicated in the regu- 
lation of gene expression in specific tissues (Supplementary Tables 9 
and 11). We performed a similar motif analysis for promoters, and 
compared the top motifs enriched in promoter and enhancer 
sequences in the same tissue (Supplementary Table 12). Only 11 motifs 
were shared between the two groups of motifs, whereas 93% of tran- 
scription factor motifs enriched in the tissue-specific enhancer were 
unique only to enhancers, confirming that enhancers and promoters 
contain different regulatory sequences, as we reported previously’°. 

Here we have described an initial survey and a draft annotation of 
the cis-regulatory sequences in the mouse genome. The wide range of 
tissue and cell types examined in this study provides an unprecedented 
opportunity to detect tissue-specific and development-specific promoters 
and enhancers, analyses of which have yielded potential clues to 
transcription regulators of tissue-specific gene expression programs. 
We show that nearly half of the mouse genome is organized into EPUs 
containing enhancers and promoters with correlated activities. These 
EPUs overlap significantly with recently discovered topological 
domains, defined by chromatin interactions, thus linking physical 
partitioning of the genome with transcriptional regulation. Such 
multigene structures*”? probably represent a general feature of 
genome organization in mammals. 


METHODS SUMMARY 


Mouse tissues were harvested from eight-week-old male C57BI/6 mice (Charles 
River). The murine embryonic fibroblasts were isolated from C57B1/6 embryos at 
embryonic day 14.5. ChIP-Seq and RNA-Seq experiments were performed as 
described'*”°, with the use of Illumina GAIIx and HiSequation (2000) instruments 
(details are provided in Supplementary Information). Hi-C experiments in adult 
cortex were conducted as described*. A software pipeline to process ChIP-Seq 
data and predict enhancers is described in Supplementary Methods. Highly 
correlated biological replicates for ChIP-Seq experiments were pooled for all 
subsequent data analyses. An algorithm to define the enhancer-promoter unit 
is given in Supplementary Methods. 
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